The goal of this Notebook is to take the parts of NovemberPhyloProc.ipynb that I expect to go into the actual manuscript.
Figure 1. Antibodies over time.
Figure 2. Weighted unifrac PCoA
Table 1. Kernel regression and weighted unifrac 1 test - binomial
Table S1. Kernel regression and weighted unifrac 1 test - gaussian
Table S2. Weighted unifrac components 2-N
Figure S1. Jensen Shannon at different agglomeration levels.
Figure S2. Statistically significant (p< 0.05, q < 0.2) family genus and species abundances (clr transformed) regressed against weighted unifrac 1.
Table S3. All family - genus and species vs antibody vs glm scores.
Figure 3. Stacked bars of key groups ordered by weighted unifrac axis 1.
Figure S3. Groups associated with IgGs.
Figure 4. Proportionality heat-map
3/20/2017 Re-ran whole pipeline end-to-end. Hits on everything (including igg gp41 0 day) except only trending on gp41 Month 6.5. However I don’t see family level groups for gp41 that relate to community structure (q < 0.2, p < 0.05). I may just try running everything end to end a few more times to see what happens. This because I want to see how consistant (or otherwise) the results are between runs. Also, I’m going to start setting seeds now.
# only use library paths in the anaconda environment
#.libPaths(grep('anaconda3', .libPaths(), value = T))
.libPaths()
[1] "/data/users/cram/Project/Nyvac_096_Microbiome/packrat/lib/x86_64-pc-linux-gnu/3.6.0"
[2] "/data/users/cram/Project/Nyvac_096_Microbiome/packrat/lib-ext"
[3] "/data/users/cram/Project/Nyvac_096_Microbiome/packrat/lib-R"
R.version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status beta
major 3
minor 6.0
year 2019
month 04
day 11
svn rev 76379
language R
version.string R version 3.6.0 beta (2019-04-11 r76379)
nickname Planting of a Tree
sessionInfo()
R version 3.6.0 beta (2019-04-11 r76379)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cluster_2.0.7-1 gdtools_0.1.7 bindrcpp_0.2.2 breakaway_4.6.11 qvalue_2.10.0
[6] purrrlyr_0.0.3 car_3.0-2 carData_3.0-2 MiRKAT_1.0.1 MASS_7.3-51.1
[11] GUniFrac_1.1 matrixStats_0.54.0 ape_5.2 PearsonDS_1.1 survival_2.43-1
[16] boot_1.3-20 kableExtra_0.9.0 knitr_1.20 broom_0.5.0 compositions_1.40-2
[21] bayesm_3.1-0.1 energy_1.7-5 robustbase_0.93-3 tensorA_0.36.1 vegan_2.5-3
[26] lattice_0.20-38 permute_0.9-4 gridExtra_2.3 forcats_0.3.0 stringr_1.3.1
[31] dplyr_0.7.7 purrr_0.2.5 readr_1.1.1 tidyr_0.8.2 tibble_1.4.2
[36] ggplot2_3.1.0 tidyverse_1.2.1 phyloseq_1.22.3 Cairo_1.5-9
loaded via a namespace (and not attached):
[1] readxl_1.1.0 backports_1.1.2 fastmatch_1.1-0 selectr_0.4-1
[5] plyr_1.8.4 igraph_1.2.2 lazyeval_0.2.1 splines_3.6.0
[9] BiocParallel_1.12.0 listenv_0.7.0 GenomeInfoDb_1.14.0 digest_0.6.18
[13] foreach_1.4.4 htmltools_0.3.6 lars_1.2 viridis_0.5.1
[17] fansi_0.4.0 magrittr_1.5 openxlsx_4.1.0 globals_0.12.4
[21] Biostrings_2.46.0 modelr_0.1.2 RcppParallel_4.4.1 svglite_1.2.1
[25] rsample_0.0.5 colorspace_1.3-2 rvest_0.3.2 haven_1.1.2
[29] crayon_1.3.4 RCurl_1.95-4.11 jsonlite_1.5 bindr_0.1.1
[33] dada2_1.6.0 phangorn_2.4.0 iterators_1.0.10 glue_1.3.0
[37] gtable_0.2.0 zlibbioc_1.24.0 XVector_0.18.0 DelayedArray_0.4.1
[41] BiocGenerics_0.24.0 DEoptimR_1.0-8 abind_1.4-5 scales_1.0.0
[45] mvtnorm_1.0-8 som_0.3-5.1 Rcpp_0.12.19 viridisLite_0.3.0
[49] mclust_5.4.1 foreign_0.8-71 stats4_3.6.0 httr_1.3.1
[53] RColorBrewer_1.1-2 chemometrics_1.4.2 pkgconfig_2.0.2 nnet_7.3-12
[57] utf8_1.1.4 tidyselect_0.2.5 labeling_0.3 rlang_0.4.0
[61] reshape2_1.4.3 munsell_0.5.0 cellranger_1.1.0 tools_3.6.0
[65] cli_1.0.1 generics_0.0.2 pls_2.7-0 ade4_1.7-13
[69] evaluate_0.12 biomformat_1.6.0 yaml_2.2.0 zip_1.0.0
[73] packrat_0.4.8-1 future_1.14.0 nlme_3.1-137 xml2_1.2.0
[77] compiler_3.6.0 rstudioapi_0.8 curl_3.2 e1071_1.7-0
[81] pcaPP_1.9-73 stringi_1.2.4 Matrix_1.2-15 multtest_2.34.0
[85] pillar_1.3.0 furrr_0.1.0 data.table_1.11.8 cowplot_0.9.3
[89] bitops_1.0-6 GenomicRanges_1.30.3 R6_2.3.0 latticeExtra_0.6-28
[93] hwriter_1.3.2 ShortRead_1.36.1 rio_0.5.10 IRanges_2.12.0
[97] codetools_0.2-15 assertthat_0.2.0 rhdf5_2.22.0 SummarizedExperiment_1.8.1
[101] rprojroot_1.3-2 withr_2.1.2 GenomicAlignments_1.14.2 Rsamtools_1.30.0
[105] S4Vectors_0.16.0 GenomeInfoDbData_1.0.0 mgcv_1.8-25 parallel_3.6.0
[109] hms_0.4.2 rpart_4.1-15 quadprog_1.5-5 grid_3.6.0
[113] class_7.3-15 rmarkdown_1.10 base64enc_0.1-3 Biobase_2.38.0
[117] lubridate_1.7.4
# https://stackoverflow.com/questions/46354826/have-a-function-that-calls-library-and-takes-either-a-package-or-its-name-as-inp
# Also return package version when loading in packages
# accept strings or functions
libver <- function(pac){
pac <- as.character(substitute(pac))
library(pac, character.only=TRUE)
packageVersion(pac)
}
#libver("dada2")
#libver("ggplot2")
libver("Cairo")
[1] ‘1.5.9’
# Much of the data handling
libver('phyloseq')
[1] ‘1.22.3’
# A bunch of environments, including ggplot, dplyr, tidyr, and broom, which I use a lot
libver('tidyverse')
[1] ‘1.2.1’
# Mostly for concatenating ggplots
library(gridExtra); packageVersion("gridExtra")
[1] ‘2.3’
# I use this surprisingly not a lot here.
library(vegan); packageVersion("vegan")
[1] ‘2.5.3’
# For making trees
# libver('phangorn')
# A prerequesite to phangorn
# libver("DECIPHER")
# Some pre-processing stuff
# libver("dada2")
# I usually reshape with tidyverse tools now, but melt and cast are often easier in a pinch
# libver("reshape2")
# For replacing NaNs without too much thought.
# libver("imputeMissings")
# Deal with proportional data, especially useful for calculating proportionality phi
libver('compositions')
[1] ‘1.40.2’
# Works with tidyverse to make model output tidy
libver('broom')
[1] ‘0.5.0’
# Make pretty tables
libver('knitr')
[1] ‘1.20’
libver('kableExtra')
[1] ‘0.9.0’
# Let those pretty tables actually show up in a jupyter notebook
#library('IRdisplay')
# For bootstrapping
libver('boot')
[1] ‘1.3.20’
# Calculate kernel regressions
libver("MiRKAT")
[1] ‘1.0.1’
libver("car")
[1] ‘3.0.2’
#libver(mclust)
#libver(chemometrics)
libver(purrrlyr)
[1] ‘0.0.3’
libver('qvalue')
[1] ‘2.10.0’
libver("breakaway")
[1] ‘4.6.11’
I have put the functions in a library file
source('libraries/library096.R')
# Set upOriginal to false, if you want to used user-reprocessed data.
# Results may differ slightly from those in the manuscript due to inter-run variation
# especially in the tree-ing algorithm.
upOriginal <- TRUE
# upOriginal <- FALSE
# For permutation tests, how fast do things need to run
# 9999 for most runs, 99999 for publication quality ones suggested
jnperm <- 99999
# Data paths
getwd()
[1] "/data/users/cram/Project/Nyvac_096_Microbiome"
(mapping_file_path <- file.path('data', 'mapping_file_096a.csv'))
[1] "data/mapping_file_096a.csv"
(immune_file_path <- file.path('data', 'immune096b.csv'))
[1] "data/immune096b.csv"
if(upOriginal){
seqtab_file_path <- file.path('data', 'seqtab.nochimNov2017.csv')
taxa_file_path <- file.path('data', 'TaxaNov2017.csv')
tree_path <- file.path('data', 'phylogeny096NovTree.tre')
} else {
seqtab_file_path <- file.path('data1', 'seqtab.nochimMar2018.csv')
taxa_file_path <- file.path('data1', 'TaxaMar2018.csv')
tree_path <- file.path('data1', 'phylogeny096Mar2018tre.tre')
}
seqtab_file_path
[1] "data/seqtab.nochimNov2017.csv"
taxa_file_path
[1] "data/TaxaNov2017.csv"
tree_path
[1] "data/phylogeny096NovTree.tre"
# Sequence data
seqtab.nochim.data <- read.csv(seqtab_file_path)
seqtabNames = gsub('\\.', '-',
gsub('.fastq', '', seqtab.nochim.data$X)
)
seqtab.nochim = as.matrix(seqtab.nochim.data[,-1])
rownames(seqtab.nochim) = seqtabNames
# Taxa names
taxa.data <- read.csv(taxa_file_path)
taxa = taxa.data[,-1]
## I reverse complemented the sequences to generate the taxonomy
# (but only in this latest re-run, not the original)
## The following undoes that reverse complement to get original sequence
#rownames(taxa) = dada2:::rc(taxa.data[,1])
if(upOriginal){
rownames(taxa) = (taxa.data[,1])} else {
rownames(taxa) = dada2:::rc(taxa.data[,1])
}
taxa <- as.matrix(taxa)
# Mapping file
mapping.data <- read_csv(mapping_file_path) %>%
mutate(pub_id = sapply(pub_id, function(x) {as.numeric(gsub("096-", "", x))}))
Parsed with column specification:
cols(
SampleID = col_character(),
BarcodeSequence = col_character(),
LinkerPrimerSequence = col_character(),
ReversePrimer = col_character(),
run_prefix = col_character(),
pub_id = col_character(),
Sex = col_character(),
Visit = col_integer(),
visitRank = col_integer(),
RXCode = col_character(),
Description = col_character()
)
#mapping = mapping.data[,-1]
#rownames(mapping) = mapping.data[,1]
#mapping <- as.matrix(mapping)
head(mapping.data)
# Immune Data
immune.data0 <- read_csv(immune_file_path)
Parsed with column specification:
cols(
visitno = col_integer(),
rx_code = col_character(),
type = col_character(),
antigen = col_character(),
mag = col_double(),
mag_bl = col_double(),
response = col_integer(),
day = col_integer(),
month = col_double(),
ct = col_character(),
response_j = col_double(),
assay = col_character(),
pub_id = col_character()
)
immune.data <- mutate(immune.data0, pub_id = sapply(pub_id, function(x) {as.numeric(gsub("096-", "", x))}))
levels(immune.data$antigen) <- gsub("[ /]", ".", levels(immune.data$antigen))
head(immune.data)
# Phylogenetic tree
seqs <- dada2::getSequences(seqtab.nochim)
names(seqs) <- seqs
pt <- ape::read.tree(file=tree_path)
pt2 <- phangorn::midpoint(pt)
immune.data$antigen %>% unique
[1] "gp41" "p24" "Con.6.gp120.B" "ZM96.gp140" "gp70_B.CaseA_V1_V2"
[6] "ANY.ENV.PTEG"
Save options to a variable
par0 <- options()
## minimal sample identification data
pub_id_key <- unique(immune.data[,c("pub_id", "rx_code", "ct")])
sample_sm0 <- dplyr::select(mapping.data, SampleID, pub_id, sex = Sex, muVisit = Visit, muVisitRank = visitRank)
sample_sm <- left_join(sample_sm0, pub_id_key, by = "pub_id") %>%
as.data.frame %>%
tibble::column_to_rownames(var = "SampleID")
Column `pub_id` has different attributes on LHS and RHS of join
# rownames(sample_sm)
# head(sample_sm)
# Make raw phyloseq object
ot <- otu_table(seqtab.nochim, taxa_are_rows=FALSE)
tt <- tax_table(taxa)
dimnames(tt) = dimnames(taxa)
spl <- sample_data(sample_sm)
psN is a really raw phyloseq object * OTU names are given as accession numbers * Numbers are in total counts * We have samples from both time points
# Quite raw phyloseq object. Species names are given as accession numbers
psN <- phyloseq(ot, tt, spl, pt2)
psN
phyloseq-class experiment-level object
otu_table() OTU Table: [ 960 taxa and 47 samples ]
sample_data() Sample Data: [ 47 samples by 6 sample variables ]
tax_table() Taxonomy Table: [ 960 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 960 tips and 959 internal nodes ]
I want to make a phyloseq object for use in essentally all of the subsequent analyses. Features include: * Some basic taxonomic pre-processing. * No uncharacterized phyla. * Only OTUs that show up at least 10% of the time in the final data set . * Do this after filtering samples. * No tip-glomming. I’ll save that untill later. * Immune data is included in the sample data table. * We’ll do Andrew’s representitive IgGs and IgAs. * We only have samples from visit 1. * We only have samples from experemental (not control) groups.
immune.data %>% pull(type) %>% unique
[1] "IgA" "IgG" "CD4+"
#immune.data %>% unite(type_antigen ,type, antigen, sep = "_")
immune.data %>% dplyr::select(pub_id, month, type, antigen, mag) %>%
filter(month %in% c(0, 6.5, 12)) %>%
unite(type_antigen, type, antigen, sep = "_") %>%
unite(type_antigen_month,type_antigen, month, sep = "_Month_") %>%
spread(key = type_antigen_month, value = mag, drop = TRUE) -> immune.table
immune.table %>% head
Some investegation suggested by the phyloseq tutorials to identify phyla for removal, and to identify an abundance threshold
psN
phyloseq-class experiment-level object
otu_table() OTU Table: [ 960 taxa and 47 samples ]
sample_data() Sample Data: [ 47 samples by 6 sample variables ]
tax_table() Taxonomy Table: [ 960 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 960 tips and 959 internal nodes ]
psN %>% subset_samples(!is.na(pub_id))
phyloseq-class experiment-level object
otu_table() OTU Table: [ 960 taxa and 38 samples ]
sample_data() Sample Data: [ 38 samples by 6 sample variables ]
tax_table() Taxonomy Table: [ 960 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 960 tips and 959 internal nodes ]
# skip the blanks
psN %>% subset_samples(!is.na(pub_id)) %>%
# OTUs must be characterized to phylum
subset_taxa(!is.na(Phylum)& !Phylum %in% c("", "uncharacterized")) -> psN_hasPhylum
psN_hasPhylum
phyloseq-class experiment-level object
otu_table() OTU Table: [ 929 taxa and 38 samples ]
sample_data() Sample Data: [ 38 samples by 6 sample variables ]
tax_table() Taxonomy Table: [ 929 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 929 tips and 928 internal nodes ]
# from 960 to 929 otus
Identifying and removing phyla with very few taxa in them
prevdf = apply(X = otu_table(psN_hasPhylum),
MARGIN = ifelse(taxa_are_rows(psN_hasPhylum), yes = 1, no = 2),
FUN = function(x){sum(x > 0)})
# Add taxonomy and total read counts to this data.frame
prevdf = data.frame(Prevalence = prevdf,
TotalAbundance = taxa_sums(psN_hasPhylum),
tax_table(psN_hasPhylum))
plyr::ddply(prevdf, "Phylum", function(df1){cbind(mean(df1$Prevalence),sum(df1$Prevalence))})
filterPhyla = c("Verrucomicrobia", "Tenericutes", "Elusimicrobia", "Synergistetes")
psN_MainPhyla = subset_taxa(psN_hasPhylum, !Phylum %in% filterPhyla)
psN_MainPhyla
phyloseq-class experiment-level object
otu_table() OTU Table: [ 922 taxa and 38 samples ]
sample_data() Sample Data: [ 38 samples by 6 sample variables ]
tax_table() Taxonomy Table: [ 922 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 922 tips and 921 internal nodes ]
# Determining abundance threshold
prevdf1 = subset(prevdf, Phylum %in% get_taxa_unique(psN_MainPhyla, "Phylum"))
ggplot(prevdf1, aes(TotalAbundance, Prevalence / nsamples(psN_hasPhylum),color=Phylum)) +
# Include a guess for parameter
geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) + geom_point(size = 2, alpha = 0.7) +
scale_x_log10() + xlab("Total Abundance") + ylab("Prevalence [Frac. Samples]") +
facet_wrap(~Phylum) + theme(legend.position="none")
# Determining abundance threshold
prevdf1 = subset(prevdf, Phylum %in% get_taxa_unique(psN_MainPhyla, "Phylum"))
ggplot(prevdf1, aes(TotalAbundance, Prevalence / nsamples(psN_hasPhylum),color=Phylum)) +
# Include a guess for parameter
geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) + geom_point(size = 2, alpha = 0.7) +
scale_x_continuous() + xlab("Total Abundance") + ylab("Prevalence [Frac. Samples]") +
facet_wrap(~Phylum) + theme(legend.position="none")
Make a phyloseq object like psN2 but with all participants, psN2A Code copied from above.
(phyloseq object of relative abundances)
And psN1 (phyloseq object of counts)
psN1A and psN2A include all participants, and will be used to look at variability within participants
psN %>%
# add all the immune data
phylo_join(immune.table, by = "pub_id") %>%
# only use data from humans (no extraction controls)
subset_samples(is.finite(muVisitRank)) %>%
# only otus from known taxa that show up frequently enough
subset_taxa(!is.na(Phylum)& !Phylum %in% c("", "uncharacterized")) %>%
subset_taxa(!Phylum %in% filterPhyla) %>%
# only otus that show up in at least 10% of samples
prevalence_filter_taxa %>%
# convert to relative abundance
tag_phyloseq%>%
# Instead of naming each taxon with its full sequence, we use the "tag" instead
swap.phyloseq.taxnames %>%
pass -> psN1A # Save pre relative abundance transformation
Setting row names on a tibble is deprecated.
# add is-male
manColumn <- psN1A %>% sample_data %>% as('data.frame') %>% rownames_to_column %>% mutate(isMale = testIsMaleVec(sex)) %>% dplyr::select(rowname, isMale)
psN1A <- phylo_join(psN1A, manColumn, by = 'rowname')
## psN2 is like psN1 but with relative abundances
psN1A %>%
transform_sample_counts(function(x) {x/sum(x)}) %>%
# The "tag" is a new name that takes into account the rest of the taxonomy data
# the tag may need to be updated after any agglomeration
pass-> psN2A
# filter to just microbiome visit 1 and experemental treatments
psN1A %>%
subset_samples(muVisitRank == 1) %>%
subset_samples(ct == "T") %>%
pass -> psN1
psN2A %>%
subset_samples(muVisitRank == 1) %>%
subset_samples(ct == "T") %>%
pass -> psN2
# Calculate weighted unifrac distances and role those in.
psN2.wuf <- phyloseq::distance(psN2, method = "wunifrac")
psN2.pcoa <- capscale(psN2.wuf ~ 1)
psN2.pcoa.df <- psN2.pcoa %>% scores(display = "sites") %>%
as.data.frame %>%
rownames_to_column %>%
dplyr::select('rowname', 'MDS1', 'MDS2') %>%
mutate(rMDS1 = rank(MDS1)) %>% # rank order of MDS1
mutate(rrMDS1 = formatC(format = "d", rMDS1, flag = "0", width=ceiling(log10(max(rMDS1))))) %>%
unite(newname, rrMDS1, rowname, sep = "_", remove = FALSE) %>%
dplyr::select(-rrMDS1)
psN2 %>%
phylo_join(
psN2.pcoa.df,
by = 'rowname'
) -> psN2
## Even if the data are counts,
## the weighted unifrac pcoa is still done on the relative abundances
psN1 %>%
phylo_join(
psN2.pcoa.df,
by = 'rowname'
) -> psN1
psN2A
phyloseq-class experiment-level object
otu_table() OTU Table: [ 651 taxa and 38 samples ]
sample_data() Sample Data: [ 38 samples by 31 sample variables ]
tax_table() Taxonomy Table: [ 651 taxa by 10 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
psN1A
phyloseq-class experiment-level object
otu_table() OTU Table: [ 651 taxa and 38 samples ]
sample_data() Sample Data: [ 38 samples by 31 sample variables ]
tax_table() Taxonomy Table: [ 651 taxa by 10 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
psN2
phyloseq-class experiment-level object
otu_table() OTU Table: [ 651 taxa and 21 samples ]
sample_data() Sample Data: [ 21 samples by 35 sample variables ]
tax_table() Taxonomy Table: [ 651 taxa by 10 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
psN1
phyloseq-class experiment-level object
otu_table() OTU Table: [ 651 taxa and 21 samples ]
sample_data() Sample Data: [ 21 samples by 35 sample variables ]
tax_table() Taxonomy Table: [ 651 taxa by 10 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
How many taxa were still present after each filtering step?
# Find number of taxa in available samples
psN %>%
# add all the immune data
phylo_join(immune.table, by = "pub_id") %>%
# filter to just microbiome visit 1 and experemental treatments
subset_samples(muVisitRank == 1) %>%
subset_samples(ct == "T") %>%
prevalence_filter_taxa(thresh = 0) %>%
pass-> psInSamples
(NInSamples <- dim(otu_table(psInSamples))[2])
[1] 960
# Number of taxa with unidentified phyla
psInSamples %>%
subset_taxa(!is.na(Phylum)& !Phylum %in% c("", "uncharacterized")) %>%
pass -> psIdentifiedPhylum
(NUnkPhylum <- NInSamples - dim(otu_table(psIdentifiedPhylum))[2])
[1] 31
filterPhyla
[1] "Verrucomicrobia" "Tenericutes" "Elusimicrobia" "Synergistetes"
# Phyla removed because they are in filterPhyla
# -- each of which show up fewer than 20 times in the data set
psIdentifiedPhylum %>%
subset_taxa(!Phylum %in% filterPhyla) %>%
pass -> psNotPhylaFiltered
(NFiltPhyla <- dim(otu_table(psIdentifiedPhylum))[2] - dim(otu_table(psNotPhylaFiltered))[2])
[1] 7
# Taxa removed because there were in fewer than 10% of the samples
psNotPhylaFiltered %>%
prevalence_filter_taxa %>%
pass -> psPFT
dim(otu_table(psNotPhylaFiltered))[2] - dim(otu_table(psPFT))[2]
[1] 386
How to participants’ immune profiles change over time?
# When were participants vaccinated?
# Copied from protocol apendix E
# visitno 1 is a screening visit, I assign it NaN
dayTable = data.frame(
visitno = seq(from = 1, to = 14, by = 1),
day = c(NaN, 0, 14, 28, 42, 84, 98, 168, 182, 196, 273, 364, 455, 545),
month = c(NaN, 0, 0.5, 1, 1.5, 3, 3.5, 6, 6.5, 7, 9, 12, 15, 18)
)
vac <- data.frame(
visitno = c(2, 4, 6, 8)
)
vac <- left_join(vac, dayTable, by = 'visitno')
vac
# Representitive antigens for further considerations
# These are essentially zero (mag = 1) at baseline
ants1 <- c('Con.6.gp120.B', 'ZM96.gp140', 'gp70_B.CaseA_V1_V2')
# These have measurable baseline magnitudes
ants2 <- c('gp41', 'p24')
donor.immune <- psN2 %>% sample_data %>% as('data.frame') %>% dplyr::select(pub_id) %>%
left_join(immune.data, by = 'pub_id')
Column `pub_id` has different attributes on LHS and RHS of join
donor.immune %>% head
psN %>% sample_data %>%
as('data.frame') %>%
filter(!is.na(pub_id)) %>%
pull(pub_id) %>%
unique %>%
pass -> microbiomeCohort
immune.data %>% filter(pub_id %in% microbiomeCohort) %>%
pass -> donor.immune
donor.immune %>% head
options(par0)
iggplot <- immune.data %>%
mutate(inCohort = pub_id %in% microbiomeCohort) %>%
filter(type == 'IgG', antigen %in% c(ants1, ants2)) %>%
mutate(antigen = factor(antigen, levels = c(ants2, ants1))) %>% # reorder facets
ggplot(aes(x = month, y =mag, group = pub_id, colour = inCohort, alpha = inCohort)) +
geom_line() +
geom_point() +
geom_rug(data = vac, aes(x = month), inherit.aes = F, color = 'blue') +
facet_grid(antigen ~ rx_code, labeller = label_wrap_gen()) +
theme_bw() +
theme(strip.text.y = element_text(angle = 0),
axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_log10(breaks = 10^(0:5)) +
scale_colour_manual(values = c("black", "red")) +
scale_alpha_manual(values = c(.6, 1)) +
labs(y = "BAMA Response Magnitude")
ggsave('figures/useiggsAllParticipants.svg')
Saving 7 x 7 in image
# To fix. Control groups don't show up in this version.
iggplot
iggplot <- immune.data %>%
mutate(inCohort = pub_id %in% microbiomeCohort) %>%
filter(type %in% c('IgA', 'CD4+') & antigen %in% c(ants2, 'ANY.ENV.PTEG'))%>%
mutate(antigen = factor(antigen, levels = c(ants2, 'ANY.ENV.PTEG'))) %>% # reorder facets
ggplot(aes(x = month, y =mag, group = pub_id, colour = inCohort, alpha = inCohort)) +
geom_line() +
geom_point() +
geom_rug(data = vac, aes(x = month), inherit.aes = F, color = 'blue') +
facet_grid(antigen ~ rx_code, labeller = label_wrap_gen(), scales = 'free_y') +
theme_bw() +
theme(strip.text.y = element_text(angle = 0),
axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_log10(breaks = 10^c(
seq(from = -2, to = 0, by = 0.25), seq(from = 0, to = 5, by = 1)
), labels = function(x) round(as.numeric(x), digits=3)) +
scale_colour_manual(values = c("black", "red")) +
scale_alpha_manual(values = c(.6, 1)) +
labs(y = "BAMA Response Magnitude")
iggplot
ggsave('figures/useIgACD4AllParticipants.svg')
Saving 7.29 x 4.5 in image
# To fix. Control groups don't show up in this version.
iggplot <- donor.immune %>% filter(type == 'IgG', antigen %in% c(ants1, ants2)) %>%
mutate(antigen = factor(antigen, levels = c(ants2, ants1))) %>% # reorder facets
ggplot(aes(x = month, y =mag, group = pub_id)) + geom_point(alpha = 0.6) + geom_line(alpha = 0.4) +
geom_rug(data = vac, aes(x = month), inherit.aes = F, color = 'blue') +
facet_grid(antigen ~ rx_code, labeller = label_wrap_gen()) +
theme_bw() + theme(strip.text.y = element_text(angle = 0), axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_log10(breaks = 10^(0:5)) +
labs(y = "BAMA Response Magnitude")
iggplot
ggsave('figures/useiggs.svg') # I can no longer save pngs with transparency, going to svg
Saving 7.29 x 4.5 in image
# To fix. Control groups don't show up in this version.
colnames(immune.data)
[1] "visitno" "rx_code" "type" "antigen" "mag" "mag_bl" "response" "day" "month"
[10] "ct" "response_j" "assay" "pub_id"
Plan. Calculate mean and variance of each antigen-type at visit 9, and at baseline.
#
geomean <- function(x, na.rm = FALSE, trim = 0, ...)
{
exp(mean(log(x, ...), na.rm = na.rm, trim = trim, ...))
}
geosd <- function(x, na.rm = FALSE, ...)
{
exp(sd(log(x, ...), na.rm = na.rm, ...))
}
immune.data %>% dplyr::filter(visitno %in% c(9), ct == "T") %>% dplyr::select(pub_id, antigen, visitno, mag, mag_bl, type) %>% group_by(antigen, type) %>%
summarize(mean_mag = geomean(mag), mean_bl = geomean(mag_bl), sd_mag = geosd(mag), sd_bl = geosd(mag_bl)) %>%
mutate(var_over_mean_mag = sd_mag^2/mean_mag, var_over_mean_bl = sd_bl^2/mean_bl)
Mean variance and variance over mean of each value.
immune.data %>% dplyr::filter(visitno %in% c(9), ct == "T") %>% dplyr::select(pub_id,rx_code, antigen, visitno, mag, mag_bl, type) %>% group_by(rx_code, antigen, type) %>%
summarize(mean_mag = geomean(mag), mean_bl = geomean(mag_bl), sd_mag = geosd(mag), sd_bl = geosd(mag_bl)) %>%
mutate(var_over_mean_mag = sd_mag^2/mean_mag, var_over_mean_bl = sd_bl^2/mean_bl) %>%
gather(key = "meas", value = "value", mean_mag, mean_bl, sd_mag, sd_bl, var_over_mean_mag, var_over_mean_bl) %>%
group_by(antigen, type, meas) %>% summarize(mean_val = mean(value)) %>%
spread(key = "meas", value = "mean_val")
As above, but this time, calculated seperately for each treatment group and then those values averaged. ZM96 and gp70 surprisingly large variance over mean. Maybe should look at delta magnitude
immune.data%>% head
immune.data %>%
mutate(mag_delta = mag / mag_bl) %>%
mutate(mag_delta = if_else(mag_delta < 1, 1, mag_delta)) %>%
dplyr::filter(visitno %in% c(9), ct == "T") %>% dplyr::select(pub_id,rx_code, antigen, visitno, mag, mag_bl, mag_delta, type) %>% group_by(rx_code, antigen, type) %>%
summarize(mean_mag = geomean(mag), mean_bl = geomean(mag_bl), mean_delta = geomean(mag_delta), sd_mag = geosd(mag), sd_bl = geosd(mag_bl), sd_delta = geosd(mag_delta)) %>%
mutate(var_over_mean_mag = sd_mag^2/mean_mag, var_over_mean_bl = sd_bl^2/mean_bl, var_over_mean_delta = sd_delta^2/mean_delta) %>%
gather(key = "meas", value = "value", mean_mag, mean_bl, sd_mag, sd_bl, var_over_mean_mag, var_over_mean_bl, mean_delta, sd_delta, var_over_mean_delta) %>%
group_by(antigen, type, meas) %>% summarize(mean_val = mean(value)) %>%
spread(key = "meas", value = "mean_val") %>%
arrange(type)
immune.data %>%
group_by(rx_code) %>%
summarize(Unique_ids = n_distinct(pub_id))
donor.immune %>%
group_by(rx_code) %>%
summarize(Unique_ids = n_distinct(pub_id))
Number of participants with a given day of first sample.
psN2 %>% sample_data %>% data.frame %>% group_by(muVisit) %>% summarize(n = length(muVisit))
How many donors were there from each treatment?
psN2 %>% sample_data %>% data.frame %>% group_by(rx_code) %>% summarize(n = length(muVisit))
Breakdown by both visit and treatment
sampleBreakdown <- psN2 %>% sample_data %>% data.frame %>% group_by(muVisit, rx_code) %>% summarize(n = length(muVisit)) %>% spread(key = rx_code, value = n, fill = 0) %>% mutate(total = T1 + T2 + T3 + T4)
sbcs <- colSums(sampleBreakdown)
sampleBreakdown <- bind_rows(sampleBreakdown, sbcs)
sampleBreakdown %>%
kable("html", escape = F, digits = 3, align = 'c') %>%
#kable_styling("striped", "hover", full_width = F) %>%
#collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass-> sampleBreakdown.html
sampleBreakdown.html
| muVisit | T1 | T2 | T3 | T4 | total |
|---|---|---|---|---|---|
| 2 | 3 | 0 | 0 | 0 | 3 |
| 9 | 4 | 2 | 5 | 0 | 11 |
| 12 | 0 | 1 | 1 | 5 | 7 |
| 23 | 7 | 3 | 6 | 5 | 21 |
sampleBreakdown.html %>% cat(file = 'tables/sampleBreakdown.html')
psN2A.wuf <- phyloseq::distance(psN2A, method = "wunifrac")
psN2A %>% sample_data %>% .[1:5, 1:10]
Sample Data: [5 samples by 10 sample variables]:
psN2A %>% sample_data %>% as("data.frame") %>% rownames_to_column(var = "Sample") %>% dplyr::select(Sample, pub_id) %>%
pass -> S2P
All.equal <- Vectorize(function(x,y){x == y})
## Convert distance matrix into long form matrix
psN2A.wuf %>% as.matrix %>% as.data.frame %>%
rownames_to_column(var = "SampleX")%>%
gather(key = "SampleY", value = "WufDist", -SampleX) %>%
left_join(S2P, by = c("SampleX" = "Sample")) %>% rename(pub_id_x = pub_id) %>%
left_join(S2P, by = c("SampleY" = "Sample")) %>% rename(pub_id_y = pub_id) %>%
mutate(SampleX = as.numeric(str_extract(SampleX, "[0-9][0-9]"))) %>%
mutate(SampleY = as.numeric(str_extract(SampleY, "[0-9][0-9]"))) %>%
## discard diagonal discard and upper half of triangular matrix
filter(SampleX < SampleY) %>%
mutate(isSamePerson = All.equal(pub_id_x, pub_id_y)) %>%
# ## discard cases where pub_id is unknown
# filter(is.finite(pub_id_x) & is.finite(pub_id_y))
pass -> AllWufDist
AllWufDist %>% head
#AllWufDist %>% ggplot(aes(x = isSamePerson, y = WufDist)) + geom_violin() + geom_dotplot(binaxis = "y", stackdir = "center", binwidth = .005)
Get Mean values for between participant and within participant weighted unifrac distances.
WufMeans <- AllWufDist %>% group_by(isSamePerson) %>% summarize(mean = mean(WufDist))
WufMeans
#SameAndDiff <- data.frame(comparison = c("different", "same"), WufMeans$mean)
SameAndDiff <- WufMeans %>% spread(key = isSamePerson, value = mean) %>% rename(between = 'FALSE', within = 'TRUE') %>% mutate(diff = between-within)
SameAndDiff
#SameAndDiff[1,2] - SameAndDiff[2,2] # Difference in weighted unifrac dissimilarity between same and different partcipants
Bootstrap some confidence intervals of within and between participant weighted unifrac distances.
# Split the data
SamePersonWufDist <- AllWufDist %>% filter(isSamePerson) #%>% dplyr::select(WufDist)
DifferentPersonWufDist <- AllWufDist %>% filter(!isSamePerson)
set.seed(334)
bootsSame <- rsample::bootstraps(SamePersonWufDist, times = 10000)
bootsDifferent <- rsample::bootstraps(DifferentPersonWufDist, times = 10000)
mean_of_bootstrap <- function(split){
locVals <- rsample::analysis(split)$WufDist
mean(locVals)
}
boot_meansSame <- bootsSame %>% mutate(mean = map_dbl(splits, mean_of_bootstrap)) %>% dplyr::select(mean)
boot_meansDifferent <- bootsDifferent %>% mutate(mean = map_dbl(splits, mean_of_bootstrap)) %>% dplyr::select(mean)
boot_means <- bind_cols(within = boot_meansSame$mean, between = boot_meansDifferent$mean) %>%
mutate(isDifferentBigger = between>within,
DifMinusSame = between - within)
#boot_means
1- sum(boot_means$isDifferentBigger)/length(boot_means$isDifferentBigger)
[1] 0.0116
The above is a bootstrapped P value that the two are different from eachother. Per a conversation I had with Klaus Hubert. Still need to find a justification that this approach is legit.
boot_means %>% ggplot(aes(x = DifMinusSame)) + geom_histogram()
A histogram of bootstrapped differences between within participant and between participant mean values.
quantile(boot_means$DifMinusSame, c(0.025, 0.5, 0.975))
2.5% 50% 97.5%
0.01288475 0.08727948 0.15505984
95% confidence intervals of the differences between same and different person microbiota.
PermWufDist <- AllWufDist %>% modelr::permute(10000, WufDist)
mean_of_is_same <- function(df){df %>% as.data.frame %>% group_by(isSamePerson) %>% summarize(mean(WufDist)) %>% spread(isSamePerson, `mean(WufDist)`)}
test <- PermWufDist %>% pull(perm) %>% .[[1]] %>% as.data.frame
test %>% head
mean_of_is_same(test)
SameAndDiff
permutedMeans <- PermWufDist %>% mutate(meanvals = map(perm, mean_of_is_same)) %>% dplyr::select(meanvals) %>% unnest %>%
rename(between = `FALSE`, within = `TRUE`) %>%
mutate(diff = between - within, absdif = abs(diff)) %>%
mutate(isExtreme = absdif >= SameAndDiff$diff, isExtreme1Tail = diff >= SameAndDiff$diff)
#permutedMeans %>% ggplot(aes(x = diff)) + geom_histogram() + geom_vline(xintercept = SameAndDiff$diff)
permutedMeans %>% ggplot(aes(x = diff)) + geom_histogram() + geom_vline(xintercept = SameAndDiff$diff)
Fraction of permuted values less extreme than difference between same and different.
#permutedMeans %>% mutate(isExtreme = absdif >= SameAndDiff$diff)
sum(permutedMeans$isExtreme) / length(permutedMeans$isExtreme)
[1] 0.0153
sum(permutedMeans$isExtreme1Tail) / length(permutedMeans$isExtreme1Tail)
[1] 0.0073
Two and one tailed permuted p-values
options(repr.plot.width=6, repr.plot.height= 6)
boot_means %>% dplyr::select(within, between) %>% gather(key = "comparison", value = "WufDist") %>% ggplot(aes(x = comparison, y = WufDist)) +
geom_dotplot(data = AllWufDist %>% mutate(isSamePerson2 = if_else(isSamePerson, "within", "between"))
, aes(x = isSamePerson2, y = WufDist), binaxis = "y", stackdir = "center", binwidth = .01, colour = "gray40", fill = "white") +
geom_violin(fill = NA) +
geom_point(data = data.frame(comparison = c("within", "between"), WufDist = WufMeans$mean), aes(x = comparison, y = c(WufDist[2], WufDist[1])), shape = 22, fill = "black", size = 2)+
theme_bw() +
scale_x_discrete(limits = c("within", "between")) + labs(y = "Weighted Unifrac Distance") #
ggsave('figures/BetweenVsWithin.png')
Saving 7.29 x 4.5 in image
Figure: Open circles represent weighed unifrac distances associated with pairs of samples taken from the same set of participants, at different time points (“within”), and samples taken from different sets of participants (“between”). Black squares indicate the observed mean of the within and between values. Violins indicate distributions of bootstrapped mean values.
psN2.wuf <- phyloseq::distance(psN2, method = "wunifrac")
psN2.pcoa <- capscale(psN2.wuf ~ 1)
# How much variance si explained by each weighted unifrac axis
# Note, ten axes cover 95% of the variance.
# I'm not going to look beyond that for any test.
data.frame(eig = psN2.pcoa$CA$eig) %>%
rownames_to_column('axis') %>%
mutate(proportion = eig/sum(eig)) %>%
mutate(cumulative = cumsum(proportion))
my_breaks = c(1, 75, 250, 500, 1000,2000)
psN2 %>% mutate_phyloseq_sample(
mc41 = factor(medcode_hl(IgG_gp41_Month_0)),
log120 = (IgG_Con.6.gp120.B_Month_12)) -> psN2_mod
psN2_mod%>%
sample_data() %>%
ggplot(aes(x = MDS1, y = MDS2)) + geom_point(aes(fill = mc41), size = 5, stroke = 1, shape = 21) +
coord_fixed(sqrt(psN2.pcoa$CA$eig[2]/psN2.pcoa$CA$eig[1])) +
viridis::scale_fill_viridis(name = 'gp41 Baseline', direction = -1, discrete = TRUE) +
#scale_colour_manual(name = 'gp41 Primary', values = c('black', 'grey70')) +
theme_bw() -> wuford_gp41
psN2_mod %>%
sample_data() %>%
ggplot(aes(x = MDS1, y = MDS2)) + geom_point(aes(fill = log120), size = 5, stroke = 1, shape = 21) +
coord_fixed(sqrt(psN2.pcoa$CA$eig[2]/psN2.pcoa$CA$eig[1])) +
viridis::scale_fill_viridis(name = 'gp120 Final', direction = 1, trans = "sqrt",
breaks = my_breaks, labels = my_breaks) +
#scale_colour_manual(name = 'gp41 Primary', values = c('black', 'grey70')) +
theme_bw() -> wuford_gp120
par <- options()
options(repr.plot.width=11, repr.plot.height= 4)
#g <- grid.arrange(wuford_gp41, wuford_gp120, ncol = 2)
g <- cowplot::plot_grid(wuford_gp41, wuford_gp120, ncol = 2, labels = c("A", "B"), label_size = 24)
g
#ggsave('figures/wunifrac_Agp41_Bgp120_pcoa.png', g, width = 8, height = 4)
#ggsave('figures/wunifrac_Agp41_Bgp120_pcoa.svg', g, width = 8, height = 4)
cowplot::save_plot('figures/wunifrac_Agp41_Bgp120_pcoa.png', g, base_width = 8, base_height = 4)
cowplot::save_plot('figures/wunifrac_Agp41_Bgp120_pcoa.svg', g, base_width = 8, base_height = 4)
wufKN2 <- D2K(as.matrix(psN2.wuf))
muDoners <- unique(sample_data(psN2)$pub_id)
immune.data %>%
filter(pub_id %in% muDoners) %>%
filter(
(type == 'IgG' &
antigen %in% ants1 &
month %in% c(6.5,12)
) |
(type %in% c('IgG', 'IgA') &
antigen %in% ants2 &
month %in% c(0,6.5,12)
) |
type == 'CD4+' &
antigen == 'ANY.ENV.PTEG' &
month %in% c(6.5, 12)
)-> use.immune
head(use.immune)
# Do permanova and related tests to a variable of interest
# This function is pretty specific to this analysis, so I'm going to leave it
# here in the notebook file
CapVar <- function(x, nperm = 9999, transformation = medcode2, family = 'binomial'){
## Pull out the needed data
psN2.wMDS <- psN2 %>% phylo_join(scores(psN2.pcoa, display = "sites", choices = 1:10) %>%
as.data.frame %>% rownames_to_column, by = 'rowname')
medWuf <- NA
rankWuf <- NA
locPS <- phylo_join(psN2.wMDS, x, by = 'pub_id')
ydata0 <- sample_data(locPS)$mag
yna <- is.na(ydata0)
#loc.wuf <- wufKN2
#loc.jsd <- jsdKN2
ydata <- ydata0
ydata <- ydata0[!yna]
loc.wuf2 <- psN2.wuf %>% as.matrix %>% .[!yna, !yna]
medWuf <- adonis(loc.wuf2 ~ transformation(ydata), permutations = nperm)
#medWuf$aov.tab[1,c('R2', 'Pr(>F)')]
## Capscale returns the same results as adonis (permanova), but also gives some other interesting results
medWufCap <- capscale(loc.wuf2 ~ transformation(ydata))
capanova <- anova(medWufCap, permutations = nperm)
samDf <- locPS %>% sample_data %>% as('data.frame') %>% rownames_to_column %>%
left_join(
vegan::scores(medWufCap, display = 'sites') %>% as.data.frame %>% dplyr::select(CAP1) %>%
rownames_to_column, by = 'rowname') %>% .[!yna,]
# # Is giving only positive results with CAP1, not sure why
# glmAnova <- glm(medcode(ydata) ~ MDS1 + CAP1, data = samDf, family = 'binomial') %>% anova(test = "Chisq")
loc_glm <- glm(transformation(ydata) ~ MDS1, data = samDf, family = family)
glmAnova <- loc_glm %>% anova(test = "Chisq")
#glmAnova['CAP1', 'Deviance']/out_capanova['NULL', 'Resid. Dev']
## check against mirkat
loc.Kwuf2 <- wufKN2[!yna, !yna]
mirkatP <- MiRKAT(y = transformation(ydata), Ks = loc.Kwuf2, out_type = "C", method = 'permutation', nperm = nperm)
#list(medWuf, capanova, mirkatP)
pred_pct <- predict(loc_glm, type = "response")
pred_01 <- as.numeric(predict(loc_glm, type = "response") > 0.5)
accuracy <- mean(transformation(ydata) == pred_01)
null_glm <- update(loc_glm, ~1)
# Canonical caluclation of McFadden's R2 for the GLM
McFadden = 1- (logLik(loc_glm)/ logLik(null_glm))
L.full = logLik(loc_glm)
D.full = -2 * L.full
L.base = logLik(null_glm)
D.base = -2 * L.base
n = dim(samDf)[1]
Nagelkerke = (1 - exp((D.full - D.base)/n))/(1 - exp(-D.base/n))
# A GLM of all weighted unifrac components
data.frame(
caps.P = capanova['Model', 'Pr(>F)'],
adonisP = medWuf$aov.tab[1, 'Pr(>F)'],
mir.P = mirkatP,
caps.F = capanova['Model', 'F'],
caps.R2 = medWufCap$CCA$tot.chi/medWufCap$tot.chi,
wuf1.P = glmAnova['MDS1', 'Pr(>Chi)'],
wuf1.DR = glmAnova['MDS1', 'Deviance'] / glmAnova['NULL', 'Resid. Dev'],
wuf1.McFadden = Nagelkerke,
accuracy,
wuf1.coef = coef(loc_glm)[2]
#cap1.P = glmAnova['CAP1', 'Pr(>Chi)'],
#cap1.R2 = glmAnova['CAP1', 'Deviance'] / glmAnova['NULL', 'Resid. Dev']
)
}
use.immune %>%
filter(type == 'IgG' & antigen == 'gp41'& month == 0 & ct == 'T') -> test.immune1
# Just confirming that the function works before it goes in a giant loop. I'd delete this,
# but i'll just end up needing it again if I do.
ptm = proc.time()
tps <- CapVar(test.immune1, nperm = 9999, transformation = medcode, family = 'binomial')
proc.time() - ptm
user system elapsed
0.710 0.028 0.737
tps
use.immune %>%
filter(type == 'CD4+' & month == 6.5 & ct == 'T') -> test.immune.pteg
# Just confirming that the function works before it goes in a giant loop. I'd delete this,
# but i'll just end up needing it again if I do.
ptm = proc.time()
tps <- CapVar(test.immune.pteg, nperm = jnperm, transformation = medcode, family = 'binomial')
proc.time() - ptm
user system elapsed
5.419 0.080 5.497
tps
# Run above function against every relevant variable.
ptm <- proc.time()
use.immune %>%
filter(ct == 'T') %>%
group_by(type, antigen, month) %>%
do(data.frame(CapVar(., nperm = jnperm))) -> permKernTable
|==== | 5% ~2 m remaining
|========= | 10% ~2 m remaining
|============== | 15% ~2 m remaining
|=================== | 20% ~2 m remaining
|======================== | 25% ~1 m remaining
|============================= | 30% ~1 m remaining
|================================= | 35% ~1 m remaining
|====================================== | 40% ~1 m remaining
|=========================================== | 45% ~1 m remaining
|================================================ | 50% ~58 s remaining
|===================================================== | 55% ~52 s remaining
|========================================================== | 60% ~46 s remaining
|=============================================================== | 65% ~41 s remaining
|=================================================================== | 70% ~35 s remaining
|======================================================================== | 75% ~29 s remaining
|============================================================================= | 80% ~24 s remaining
|================================================================================== | 85% ~18 s remaining
|======================================================================================= | 90% ~12 s remaining
|============================================================================================ | 95% ~6 s remaining
|=================================================================================================|100% ~0 s remaining
permKernTable
proc.time() - ptm
user system elapsed
115.577 0.751 116.264
The above function runs several extra tests. Results as follows:
type antigen visitno - things we run over
caps.P - Capscale test asks whether if we rotate things a bit and then try to use the best axis to compare to the data. Its similar to the wuf1.P value, but with some rotation
adonisP - p-value for a permanova test. Similar to mirkat p-value. One key exception is that igg_gp41_Month_0 falls on different sides of the 0.05 threshold.
mir.P is the p value for the kernel regression test, as run in the MiRKAT package. (Zhao et al., 2015)
caps.F and caps R2 are the f and r squared values for the capscale test.
wuf.P - is the p value of a glm comparing weighted unifrac component one against variables of interest. This test appears to always be statistically significantly positive when the mirkat test is positve.
wuf1.DR - one way of calculating an R2 value from a glm. We devide the deviance by the residual deviance
wuf1.McFadden - is a McFadden’s pseudo R^2. This turns out to be identical to the previous calculation.
accuracy - the fraction of the time that the glm predicts something falls above or below the median correctly. This turns out to not be super informative. Everything has around a 60% accuracy.
wuf1.coef - the coefficient of the glm model. The sign is relevant. Things with postive sign are associated with high values of weighted unifrac axis 1.
# Clean up so we just see the results of the kernel regression
concisePermKernTable <- permKernTable %>% ungroup %>%
mutate(Kernel_Q = p2q(mir.P), MDS1_Q = p2q(wuf1.P)) %>%
dplyr::select(Type = type, Antigen = antigen,Month = month, Kernel_P = mir.P, Kernel_Q,
MDS1_P = wuf1.P, MDS1_Q, GlmMDS1_R2 = wuf1.McFadden, MDS1_Coef = wuf1.coef) %>%
as.data.frame %>%
pass -> concisePermKernTable
write.csv(format(concisePermKernTable, digits = 3), 'tables/concisePermkernTable.csv')
# export conditionally formatted table as html
colNames1 = c(' ' = 3, 'Kernel' = 2, 'MDS' = 4)
colNames2 = c('Type', 'Antigen', 'Month', 'P', 'Q', 'P', 'Q', 'R2', 'Coef' )
concisePermKernTable %>%
mutate(
# this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 3), "html",
bold = ifelse(Kernel_P < 0.05,
T,
F),
italic = ifelse(Kernel_P < 0.05 & MDS1_Coef < 0,
T,
F),
background = ifelse(Kernel_P < 0.05, ifelse(MDS1_Coef < 0, "lightsalmon", "lightblue"), "")
),
Kernel_P = cell_spec(format_round(Kernel_P, 3), "html",
bold = ifelse(Kernel_P < 0.05, T, F),
background = ifelse(Kernel_P < 0.05, 'yellow', '')
),
Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "html",
bold = ifelse(Kernel_Q < 0.2, T, F),
background = ifelse(Kernel_Q < 0.2, 'lightyellow', '')
),
MDS1_P = cell_spec(format_round(MDS1_P, 3), "html",
bold = ifelse(MDS1_P < 0.05, T, F),
background = ifelse(MDS1_P < 0.05, 'yellow', '')
),
MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "html",
bold = ifelse(MDS1_Q < 0.2, T, F),
background = ifelse(MDS1_Q < 0.2, 'lightyellow', '')
),
#Month = cell_spec(format_round(Month,0), "html")
Month = cell_spec(Month, "html")
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable
toTable %>%
kable("html", escape = F, digits = 3, align = 'c', col.names = colNames2) %>%
kable_styling("striped", "hover", full_width = F) %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") -> concisePermKernTable.html
concisePermKernTable.html
| Type | Antigen | Month | P | Q | P | Q | R2 | Coef |
|---|---|---|---|---|---|---|---|---|
| CD4+ | Any ENV PTEG | 6.5 | 0.251 | 0.497 | 0.218 | 0.125 | 0.093 | -0.937 |
| 12 | 0.249 | 0.497 | 0.228 | 0.125 | 0.094 | -0.920 | ||
| IgA | gp41 | 0 | 0.958 | 0.958 | 0.651 | 0.219 | 0.013 | 0.333 |
| 6.5 | 0.206 | 0.497 | 0.143 | 0.109 | 0.129 | -1.133 | ||
| 12 | 0.744 | 0.930 | 0.855 | 0.259 | 0.002 | -0.136 | ||
| p24 | 0 | 0.893 | 0.952 | 0.320 | 0.161 | 0.061 | 0.752 | |
| 6.5 | 0.902 | 0.952 | 0.650 | 0.219 | 0.013 | -0.333 | ||
| 12 | 0.385 | 0.593 | 0.387 | 0.172 | 0.049 | -0.657 | ||
| IgG | Con.6.gp120.B | 6.5 | 0.004 | 0.086 | 0.002 | 0.011 | 0.497 | -3.104 |
| 12 | 0.036 | 0.154 | 0.010 | 0.017 | 0.361 | -2.289 | ||
| gp41 | 0 | 0.046 | 0.154 | 0.014 | 0.017 | 0.331 | 2.185 | |
| 6.5 | 0.045 | 0.154 | 0.030 | 0.031 | 0.267 | -1.809 | ||
| 12 | 0.651 | 0.868 | 0.779 | 0.248 | 0.005 | -0.205 | ||
| gp70 B.CaseA V1-V2 | 6.5 | 0.904 | 0.952 | 0.619 | 0.219 | 0.016 | -0.365 | |
| 12 | 0.035 | 0.154 | 0.014 | 0.017 | 0.332 | -2.135 | ||
| p24 | 0 | 0.218 | 0.497 | 0.444 | 0.179 | 0.037 | -0.567 | |
| 6.5 | 0.420 | 0.600 | 0.397 | 0.172 | 0.047 | -0.749 | ||
| 12 | 0.286 | 0.497 | 0.159 | 0.109 | 0.120 | -1.085 | ||
| ZM96.gp140 | 6.5 | 0.017 | 0.154 | 0.009 | 0.017 | 0.370 | -2.338 | |
| 12 | 0.298 | 0.497 | 0.162 | 0.109 | 0.118 | -1.076 |
concisePermKernTable.html %>% cat(file = 'tables/concisePermkernTable.html')
Latex version of the same table
docHead <- "\\documentclass[12pt]{article} % use larger type; default would be 10pt
\\usepackage[utf8]{inputenc} % set input encoding (not needed with XeLaTeX)
\\usepackage{booktabs}
\\usepackage{longtable}
\\usepackage{array}
\\usepackage{multirow}
\\usepackage[table]{xcolor}
\\usepackage{wrapfig}
\\usepackage{float}
\\usepackage{colortbl}
\\usepackage{pdflscape}
\\usepackage{tabu}
\\usepackage{threeparttable}
\\usepackage{threeparttablex}
\\usepackage[normalem]{ulem}
\\usepackage{makecell}
\\definecolor{green}{rgb}{1, 1, .9}
\\begin{document}
"
docTail <- "\\end{document}
"
# Make latex table
concisePermKernTable %>%
mutate(
# this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 3), "latex",
bold = ifelse(MDS1_P < 0.05,
T,
F),
italic = ifelse(MDS1_P < 0.05 & MDS1_Coef < 0,
T,
F)),
Kernel_P = cell_spec(format_round(Kernel_P, 3), "latex",
bold = ifelse(Kernel_P < 0.05, T, F),
background = ifelse(Kernel_P < 0.05, 'yellow', 'white')
),
Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "latex",
bold = ifelse(Kernel_Q < 0.2, T, F),
background = ifelse(Kernel_Q < 0.2, 'green', 'white')
),
MDS1_P = cell_spec(format_round(MDS1_P, 3), "latex",
bold = ifelse(MDS1_P < 0.05, T, F),
background = ifelse(MDS1_P < 0.05, 'yellow', 'white')
),
MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "latex",
bold = ifelse(MDS1_Q < 0.2, T, F),
background = ifelse(MDS1_Q < 0.2, 'green', 'white')
),
#Month = cell_spec(format_round(Month,0), "html")
Month = cell_spec(Month, "latex")
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable
toTable %>%
kable("latex", escape = F, digits = 3, align = 'c', col.names = colNames2, booktabs = T) %>%
kable_styling(position = "left") %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass -> concisePermKernTable.tex
# Print latex table to tex file
cat(docHead, concisePermKernTable.tex, docTail, file = 'tables/concisePermkernTable1.tex')
concisePermKernTable %>% filter(Kernel_P < 0.05) -> shortPermkernTable
shortPermkernTable
write.csv(format(shortPermkernTable, digits = 3), 'tables/shortPermkernTable.csv')
Of kernel regression P values
my_runif <- function(Len){
loc_runif <- runif(n = Len)
sort_loc_runif <- sort(loc_runif)
data.frame(case = 1:Len, relement = sort_loc_runif)
}
calc_bound <- function(df, bound){
quantile(df$relement, bound)
}
make_qqdata <- function(pvec, nboot = 10000){
locP <- pvec
LP <- length(locP)
#1:10 %>% map(runif, n = LP)
sortedP <- sort(locP)
exp2 <- 1:length(locP)/length(locP)
sortedExpP <- sort(exp2)
random_pvalues <- data.frame(iter = 1:nboot) %>%
mutate(rand = map(iter, ~my_runif(Len = LP))) %>% unnest %>%
nest(-case) %>%
mutate(lb = map_dbl(data, ~calc_bound(., 0.025)),
ub = map_dbl(data, ~calc_bound(.,0.975))) %>% dplyr::select (-data)
qqdata <- bind_cols(sortedP = sortedP, sortedExpP = sortedExpP, random_pvalues)
return(qqdata)
}
qqdata_permKernMir <- make_qqdata(permKernTable$mir.P)
qqdata_permKernMir %>% str
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 20 obs. of 5 variables:
$ sortedP : num 0.00431 0.01671 0.03465 0.03584 0.04457 ...
$ sortedExpP: num 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 ...
$ case : int 1 2 3 4 5 6 7 8 9 10 ...
$ lb : num 0.00123 0.01314 0.03293 0.05827 0.08676 ...
$ ub : num 0.169 0.247 0.317 0.38 0.441 ...
options(repr.plot.width=6, repr.plot.height= 6)
qqdata_permKernMir %>% ggplot(aes(x = sortedExpP, y = sortedP)) + geom_point(shape = 1) + geom_abline(intercept = 0, slope = 1) +
labs(x = expression(paste("Expected ", italic("p"),"-value")), y = expression(paste("Observed ", italic("p"),"-value")))
ggsave('figures/qq_permKernMir.png')
Saving 7.29 x 4.5 in image
qqdata_permKernMir %>% ggplot(aes(x = sortedExpP, y = sortedP)) + geom_point(shape = 1) + geom_abline(intercept = 0, slope = 1) + geom_line(aes(y = lb), colour = "grey40") + geom_line(aes(y = ub), colour = "grey40") + scale_y_log10() + scale_x_log10()
We see from the qqplot that we generally fall below the 1:1 line, suggesting that our P values are lower than we would expect from chance. To make the grey lines, I sampled 95% confidence intervals of random draws from the uniform distribution (essentially random p values). The data points sometimes, but not always fall below the lower one of these. However, we care about all of the p values, not wheter eaach individual is more unusual than we would expect by chance. I’ll replort just the regular qqplot as a supplemental figure.
ptm = proc.time()
tps <- CapVar(test.immune1, nperm = 9999, transformation = function(x){jac_box_cox_2(x)}, family = 'gaussian')
proc.time() - ptm
user system elapsed
0.764 0.004 0.767
tps
# Run above function against every relevant variable.
ptm <- proc.time()
use.immune %>%
filter(ct == 'T') %>%
group_by(type, antigen, month) %>%
do(data.frame(CapVar(., nperm = jnperm,
transformation = function(x){jac_box_cox_2(x)},
family = 'gaussian'))) -> permKernTableGaus
|==== | 5% ~2 m remaining
|========= | 10% ~2 m remaining
|============== | 15% ~2 m remaining
|=================== | 20% ~2 m remaining
|======================== | 25% ~2 m remaining
|============================= | 30% ~1 m remaining
|================================= | 35% ~1 m remaining
|====================================== | 40% ~1 m remaining
|=========================================== | 45% ~1 m remaining
|================================================ | 50% ~59 s remaining
|===================================================== | 55% ~54 s remaining
|========================================================== | 60% ~48 s remaining
|=============================================================== | 65% ~42 s remaining
|=================================================================== | 70% ~36 s remaining
|======================================================================== | 75% ~30 s remaining
|============================================================================= | 80% ~24 s remaining
|================================================================================== | 85% ~18 s remaining
|======================================================================================= | 90% ~12 s remaining
|============================================================================================ | 95% ~6 s remaining
|=================================================================================================|100% ~0 s remaining
permKernTableGaus
proc.time() - ptm
user system elapsed
118.762 0.507 119.223
# Clean up so we just see the results of the kernel regression
concisePermKernTableGaus <- permKernTableGaus %>% ungroup %>%
mutate(Kernel_Q = p2q(mir.P), MDS1_Q = p2q(wuf1.P)) %>%
dplyr::select(Type = type, Antigen = antigen, Month = month, Kernel_P = mir.P, Kernel_Q,
MDS1_P = wuf1.P, MDS1_Q, MDS1_R2 = wuf1.McFadden, MDS1_Coef = wuf1.coef) %>%
as.data.frame %>%
pass
concisePermKernTableGaus
write.csv(format(concisePermKernTableGaus, digits = 3), 'tables/concisePermkernTableGaus.csv')
# export conditionally formatted table as html
concisePermKernTableGaus %>%
mutate(
# this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 2), "html",
bold = ifelse(Kernel_P < 0.05,
T,
F),
italic = ifelse(Kernel_P < 0.05 & MDS1_Coef < 0,
T,
F),
background = ifelse(Kernel_P < 0.05, ifelse(MDS1_Coef < 0, "lightsalmon", "lightblue"), "")
),
Kernel_P = cell_spec(format_round(Kernel_P, 3), "html",
bold = ifelse(Kernel_P < 0.05, T, F),
background = ifelse(Kernel_P < 0.05, 'yellow', '')
),
Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "html",
bold = ifelse(Kernel_Q < 0.2, T, F),
background = ifelse(Kernel_Q < 0.2, 'lightyellow', '')
),
MDS1_P = cell_spec(format_round(MDS1_P, 3), "html",
bold = ifelse(MDS1_P < 0.05, T, F),
background = ifelse(MDS1_P < 0.05, 'yellow', '')
),
MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "html",
bold = ifelse(MDS1_Q < 0.2, T, F),
background = ifelse(MDS1_Q < 0.2, 'lightyellow', '')
),
Month = cell_spec(Month, "html")
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable
toTable %>%
kable("html", escape = F, digits = 3, align = 'c', col.names = colNames2) %>%
kable_styling("striped", "hover", full_width = F) %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") -> concisePermKernTableGaus.html
concisePermKernTableGaus.html
| Type | Antigen | Month | P | Q | P | Q | R2 | Coef |
|---|---|---|---|---|---|---|---|---|
| CD4+ | Any ENV PTEG | 6.5 | 0.602 | 0.803 | 0.597 | 0.344 | 0.015 | -0.196 |
| 12 | 0.185 | 0.492 | 0.329 | 0.248 | 0.054 | -0.356 | ||
| IgA | gp41 | 0 | 0.989 | 0.989 | 0.633 | 0.344 | 0.013 | 0.177 |
| 6.5 | 0.411 | 0.636 | 0.601 | 0.344 | 0.015 | -0.194 | ||
| 12 | 0.651 | 0.813 | 0.504 | 0.329 | 0.026 | 0.252 | ||
| p24 | 0 | 0.954 | 0.989 | 0.433 | 0.303 | 0.033 | 0.289 | |
| 6.5 | 0.976 | 0.989 | 0.857 | 0.441 | 0.002 | -0.067 | ||
| 12 | 0.575 | 0.803 | 0.309 | 0.248 | 0.058 | -0.377 | ||
| IgG | Con.6.gp120.B | 6.5 | 0.073 | 0.366 | 0.032 | 0.093 | 0.208 | -0.721 |
| 12 | 0.001 | 0.014 | 0.000 | 0.000 | 0.530 | -1.150 | ||
| gp41 | 0 | 0.221 | 0.492 | 0.038 | 0.093 | 0.197 | 0.701 | |
| 6.5 | 0.219 | 0.492 | 0.080 | 0.102 | 0.148 | -0.608 | ||
| 12 | 0.263 | 0.514 | 0.173 | 0.169 | 0.095 | -0.486 | ||
| gp70 B.CaseA V1-V2 | 6.5 | 0.282 | 0.514 | 0.206 | 0.183 | 0.083 | -0.454 | |
| 12 | 0.111 | 0.444 | 0.083 | 0.102 | 0.145 | -0.602 | ||
| p24 | 0 | 0.753 | 0.886 | 0.907 | 0.444 | 0.001 | 0.044 | |
| 6.5 | 0.027 | 0.193 | 0.053 | 0.102 | 0.184 | -0.787 | ||
| 12 | 0.413 | 0.636 | 0.133 | 0.145 | 0.113 | -0.531 | ||
| ZM96.gp140 | 6.5 | 0.029 | 0.193 | 0.017 | 0.083 | 0.246 | -0.783 | |
| 12 | 0.198 | 0.492 | 0.078 | 0.102 | 0.150 | -0.612 |
concisePermKernTableGaus.html %>% cat(file = 'tables/concisePermkernTableGaus.html')
concisePermKernTableGaus %>% filter(Kernel_P < 0.05) -> shortPermkernTableGaus
shortPermkernTableGaus
write.csv(format(shortPermkernTableGaus, digits = 3), 'tables/shortPermkernTable.csv')
# Make latex table
concisePermKernTableGaus %>%
mutate(
# this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 3), "html",
bold = ifelse(Kernel_P < 0.05,
T,
F),
italic = ifelse(Kernel_P < 0.05 & MDS1_Coef < 0,
T,
F),
background = ifelse(Kernel_P < 0.05, ifelse(MDS1_Coef < 0, "lightsalmon", "lightblue"), "")
),
Kernel_P = cell_spec(format_round(Kernel_P, 3), "latex",
bold = ifelse(Kernel_P < 0.05, T, F),
background = ifelse(Kernel_P < 0.05, 'yellow', 'white')
),
Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "latex",
bold = ifelse(Kernel_Q < 0.2, T, F),
background = ifelse(Kernel_Q < 0.2, 'green', 'white')
),
MDS1_P = cell_spec(format_round(MDS1_P, 3), "latex",
bold = ifelse(MDS1_P < 0.05, T, F),
background = ifelse(MDS1_P < 0.05, 'yellow', 'white')
),
MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "latex",
bold = ifelse(MDS1_Q < 0.2, T, F),
background = ifelse(MDS1_Q < 0.2, 'green', 'white')
),
#Month = cell_spec(format_round(Month,0), "html")
Month = cell_spec(Month, "latex")
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable
toTable %>%
kable("latex", escape = F, digits = 3, align = 'c', col.names = colNames2, booktabs = T) %>%
kable_styling(position = "left") %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass -> concisePermKernTableGaus.tex
# Print latex table to tex file
cat(docHead, concisePermKernTableGaus.tex, docTail, file = 'tables/concisePermkernTableGaus.tex')
### QQplot for kernel regression data
qqdata_permKernMirGaus <- make_qqdata(permKernTableGaus$mir.P)
options(repr.plot.width=6, repr.plot.height= 6)
qqdata_permKernMirGaus %>% ggplot(aes(x = sortedExpP, y = sortedP)) + geom_point(shape = 1) + geom_abline(intercept = 0, slope = 1) +
labs(x = expression(paste("Expected ", italic("p"),"-value")), y = expression(paste("Observed ", italic("p"),"-value")))
ggsave('figures/qq_permKernMirGaus.png')
Saving 7.29 x 4.5 in image
use.immune %>% dplyr::select(pub_id, visitno, type, antigen, mag) -> tmp
full_join(tmp, tmp, by = 'pub_id') %>%
group_by(visitno.x, type.x, antigen.x, visitno.y, type.y, antigen.y) %>%
nest %>%
mutate(x2 = map(data, function(df){unwarn(chisq.test(df$mag.x, df$mag.y))})) %>%
mutate(glance = map(x2, glance)) %>%
dplyr::select(-data, -x2) %>%
unnest(glance) %>%
#mutate(q.value = p2q(p.value)) %>% # reurns NaNs
pass -> compareImmuneX2
compareImmuneX2 %>% filter(
type.x == 'IgG' &
antigen.x == 'gp41' &
type.y == 'IgG' &
antigen.y == 'gp41'
)
compareImmuneX2 %>%
filter(type.x == 'IgG' & type.y == 'IgG' & antigen.x != antigen.y) %>%
write_csv('tables/chisq_IgG_comparasons.csv')
psN2 %>% phylo_join(scores(psN2.pcoa, display = "sites", choices = 1:10) %>%
as.data.frame %>% rownames_to_column, by = 'rowname') %>%
sample_data %>% as('data.frame') %>% rownames_to_column -> hereSam
data_frame(formula = paste("transformation(ydata) ~ MDS", 1:10, sep = ""))
EachMDS <- function(x, nperm = 9999, transformation = medcode2, family = 'binomial'){
## Pull out the needed data
psN2.wMDS <- psN2 %>% phylo_join(scores(psN2.pcoa, display = "sites", choices = 1:10) %>%
as.data.frame %>% rownames_to_column, by = 'rowname')
# medWuf <- NA
# rankWuf <- NA
locPS <- phylo_join(psN2.wMDS, x, by = 'pub_id')
ydata0 <- sample_data(locPS)$mag
yna <- is.na(ydata0)
#loc.wuf <- wufKN2
#loc.jsd <- jsdKN2
ydata <- ydata0
ydata <- ydata0[!yna]
loc.wuf2 <- psN2.wuf %>% as.matrix %>% .[!yna, !yna]
samDf <- locPS %>% sample_data %>% as('data.frame') %>% rownames_to_column %>%
.[!yna,]
# # Is giving only positive results with CAP1, not sure why
loc_glm <- glm(as.formula("transformation(ydata) ~ MDS1"), data = samDf, family = family)
glmAnova <- loc_glm %>% anova(test = "Chisq")
# data_frame, rather than data.frame
# https://stackoverflow.com/questions/48450308/iterating-over-formulas-in-purrr#48450308
data_frame(formulaString = paste("transformation(ydata) ~ MDS", 1:10, sep = "")) %>%
mutate(model = map(formulaString, function(fs){
glm(as.formula(fs), data = samDf, family = family)})) %>%
mutate(anova = map(model, anova)) %>%
mutate(glance = map(model, glance)) %>%
mutate(tidy = map(model, tidy)) %>%
mutate(coef = map(model, ~ coef(summary(.))[2,])) %>%
pass -> allmodels
allmodels %>% dplyr::select("tidy") %>% unnest %>% filter(term != '(Intercept)')
}
# Just confirming that the function works before it goes in a giant loop. I'd delete this,
# but i'll just end up needing it again if I do.
ptm = proc.time()
tps <- EachMDS(test.immune.pteg, nperm = 9999, transformation = medcode, family = 'binomial')
proc.time() - ptm
user system elapsed
0.188 0.000 0.187
tps
use.immune %>%
group_by(type, antigen, month) %>%
nest %>%
mutate(coefs = map(data, ~ EachMDS(.))) %>%
dplyr::select(-data) %>% unnest(coefs) -> glmMDScoefs
ants1
[1] "Con.6.gp120.B" "ZM96.gp140" "gp70_B.CaseA_V1_V2"
glmMDScoefs %>%
gather(key = "key", value = "value", estimate:p.value) %>%
filter(key == "p.value") %>%
spread(key = term, value = value) %>%
dplyr::select(-key, -MDS10, MDS10) %>%
dplyr::rename(Type = type, Antigen = antigen, Month = month) %>%
mutate(Type = factor(Type, levels = c( "IgA", "IgG", "CD4+"))) %>%
mutate(Antigen = factor(Antigen, levels = c(ants1, ants2, "ANY.ENV.PTEG"))) %>%
#Clean up labels
mutate(Antigen = stringr::str_replace_all(Antigen, "_", " ")) %>%
mutate(Antigen = stringr::str_replace_all(Antigen, "V1 V2", "V1-V2")) %>%
mutate(Antigen = stringr::str_replace_all(Antigen, "ANY.ENV.PTEG", "Any ENV PTEG")) %>%
arrange(Type) -> allMDS
allMDS
allMDS %>%
kable("html", escape = F, digits = 3, align = 'c') %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
as.character() -> allMDS.html
allMDS %>%
kable("html", escape = F, digits = 3, align = 'c') %>%
collapse_rows(columns = 1:2, latex_hline = "full")
| Type | Antigen | Month | MDS1 | MDS2 | MDS3 | MDS4 | MDS5 | MDS6 | MDS7 | MDS8 | MDS9 | MDS10 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IgA | gp41 | 0.0 | 0.653 | 0.607 | 0.401 | 0.702 | 0.786 | 0.650 | 0.242 | 0.349 | 0.038 | 0.700 |
| 6.5 | 0.168 | 0.368 | 0.701 | 0.954 | 0.112 | 0.142 | 0.456 | 0.475 | 0.789 | 0.864 | ||
| 12.0 | 0.855 | 0.201 | 0.462 | 0.408 | 0.994 | 0.798 | 0.469 | 0.106 | 0.079 | 0.232 | ||
| p24 | 0.0 | 0.335 | 0.629 | 0.520 | 0.939 | 0.666 | 0.657 | 0.148 | 0.853 | 0.921 | 0.517 | |
| 6.5 | 0.652 | 0.281 | 0.681 | 0.414 | 0.438 | 0.599 | 0.999 | 0.199 | 0.881 | 0.351 | ||
| 12.0 | 0.398 | 0.724 | 0.227 | 0.202 | 0.128 | 0.465 | 0.445 | 0.156 | 0.516 | 0.159 | ||
| IgG | Con.6.gp120.B | 6.5 | 0.017 | 0.536 | 0.420 | 0.890 | 0.325 | 0.427 | 0.853 | 0.806 | 0.634 | 0.250 |
| 12.0 | 0.031 | 0.878 | 0.377 | 0.635 | 0.989 | 0.540 | 0.569 | 0.374 | 0.686 | 0.299 | ||
| gp41 | 0.0 | 0.040 | 0.894 | 0.304 | 0.967 | 0.442 | 0.231 | 0.504 | 0.595 | 0.856 | 0.509 | |
| 6.5 | 0.057 | 0.274 | 0.226 | 0.891 | 0.858 | 0.165 | 0.601 | 0.426 | 0.212 | 0.606 | ||
| 12.0 | 0.779 | 0.310 | 0.443 | 0.228 | 0.680 | 0.268 | 0.995 | 0.039 | 0.357 | 0.054 | ||
| gp70 B.CaseA V1-V2 | 6.5 | 0.621 | 0.853 | 0.300 | 0.458 | 0.507 | 0.690 | 0.506 | 0.161 | 0.039 | 0.880 | |
| 12.0 | 0.037 | 0.665 | 0.808 | 0.447 | 0.318 | 0.296 | 0.743 | 0.143 | 0.403 | 0.276 | ||
| p24 | 0.0 | 0.451 | 0.231 | 0.986 | 0.088 | 0.149 | 0.051 | 0.882 | 0.507 | 0.384 | 0.439 | |
| 6.5 | 0.404 | 0.828 | 0.243 | 0.250 | 0.689 | 0.082 | 0.180 | 0.846 | 0.360 | 0.101 | ||
| 12.0 | 0.182 | 0.721 | 0.237 | 0.676 | 0.933 | 0.371 | 0.042 | 0.423 | 0.716 | 0.147 | ||
| ZM96.gp140 | 6.5 | 0.030 | 0.631 | 0.244 | 0.750 | 0.234 | 0.949 | 0.890 | 0.994 | 0.090 | 0.504 | |
| 12.0 | 0.186 | 0.353 | 0.286 | 0.722 | 0.889 | 0.409 | 0.459 | 0.376 | 0.021 | 0.189 | ||
| CD4+ | Any ENV PTEG | 6.5 | 0.237 | 0.555 | 0.771 | 0.108 | 0.237 | 0.104 | 0.717 | 0.084 | 0.627 | 0.319 |
| 12.0 | 0.248 | 0.565 | 0.434 | 0.103 | 0.117 | 0.856 | 0.861 | 0.128 | 0.773 | 0.723 |
allMDS.html
[1] "<table>\n <thead>\n <tr>\n <th style=\"text-align:center;\"> Type </th>\n <th style=\"text-align:center;\"> Antigen </th>\n <th style=\"text-align:center;\"> Month </th>\n <th style=\"text-align:center;\"> MDS1 </th>\n <th style=\"text-align:center;\"> MDS2 </th>\n <th style=\"text-align:center;\"> MDS3 </th>\n <th style=\"text-align:center;\"> MDS4 </th>\n <th style=\"text-align:center;\"> MDS5 </th>\n <th style=\"text-align:center;\"> MDS6 </th>\n <th style=\"text-align:center;\"> MDS7 </th>\n <th style=\"text-align:center;\"> MDS8 </th>\n <th style=\"text-align:center;\"> MDS9 </th>\n <th style=\"text-align:center;\"> MDS10 </th>\n </tr>\n </thead>\n<tbody>\n <tr>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"6\"> IgA </td>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> gp41 </td>\n <td style=\"text-align:center;\"> 0.0 </td>\n <td style=\"text-align:center;\"> 0.653 </td>\n <td style=\"text-align:center;\"> 0.607 </td>\n <td style=\"text-align:center;\"> 0.401 </td>\n <td style=\"text-align:center;\"> 0.702 </td>\n <td style=\"text-align:center;\"> 0.786 </td>\n <td style=\"text-align:center;\"> 0.650 </td>\n <td style=\"text-align:center;\"> 0.242 </td>\n <td style=\"text-align:center;\"> 0.349 </td>\n <td style=\"text-align:center;\"> 0.038 </td>\n <td style=\"text-align:center;\"> 0.700 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.168 </td>\n <td style=\"text-align:center;\"> 0.368 </td>\n <td style=\"text-align:center;\"> 0.701 </td>\n <td style=\"text-align:center;\"> 0.954 </td>\n <td style=\"text-align:center;\"> 0.112 </td>\n <td style=\"text-align:center;\"> 0.142 </td>\n <td style=\"text-align:center;\"> 0.456 </td>\n <td style=\"text-align:center;\"> 0.475 </td>\n <td style=\"text-align:center;\"> 0.789 </td>\n <td style=\"text-align:center;\"> 0.864 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.855 </td>\n <td style=\"text-align:center;\"> 0.201 </td>\n <td style=\"text-align:center;\"> 0.462 </td>\n <td style=\"text-align:center;\"> 0.408 </td>\n <td style=\"text-align:center;\"> 0.994 </td>\n <td style=\"text-align:center;\"> 0.798 </td>\n <td style=\"text-align:center;\"> 0.469 </td>\n <td style=\"text-align:center;\"> 0.106 </td>\n <td style=\"text-align:center;\"> 0.079 </td>\n <td style=\"text-align:center;\"> 0.232 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> p24 </td>\n <td style=\"text-align:center;\"> 0.0 </td>\n <td style=\"text-align:center;\"> 0.335 </td>\n <td style=\"text-align:center;\"> 0.629 </td>\n <td style=\"text-align:center;\"> 0.520 </td>\n <td style=\"text-align:center;\"> 0.939 </td>\n <td style=\"text-align:center;\"> 0.666 </td>\n <td style=\"text-align:center;\"> 0.657 </td>\n <td style=\"text-align:center;\"> 0.148 </td>\n <td style=\"text-align:center;\"> 0.853 </td>\n <td style=\"text-align:center;\"> 0.921 </td>\n <td style=\"text-align:center;\"> 0.517 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.652 </td>\n <td style=\"text-align:center;\"> 0.281 </td>\n <td style=\"text-align:center;\"> 0.681 </td>\n <td style=\"text-align:center;\"> 0.414 </td>\n <td style=\"text-align:center;\"> 0.438 </td>\n <td style=\"text-align:center;\"> 0.599 </td>\n <td style=\"text-align:center;\"> 0.999 </td>\n <td style=\"text-align:center;\"> 0.199 </td>\n <td style=\"text-align:center;\"> 0.881 </td>\n <td style=\"text-align:center;\"> 0.351 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.398 </td>\n <td style=\"text-align:center;\"> 0.724 </td>\n <td style=\"text-align:center;\"> 0.227 </td>\n <td style=\"text-align:center;\"> 0.202 </td>\n <td style=\"text-align:center;\"> 0.128 </td>\n <td style=\"text-align:center;\"> 0.465 </td>\n <td style=\"text-align:center;\"> 0.445 </td>\n <td style=\"text-align:center;\"> 0.156 </td>\n <td style=\"text-align:center;\"> 0.516 </td>\n <td style=\"text-align:center;\"> 0.159 </td>\n </tr>\n <tr>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"12\"> IgG </td>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> Con.6.gp120.B </td>\n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.017 </td>\n <td style=\"text-align:center;\"> 0.536 </td>\n <td style=\"text-align:center;\"> 0.420 </td>\n <td style=\"text-align:center;\"> 0.890 </td>\n <td style=\"text-align:center;\"> 0.325 </td>\n <td style=\"text-align:center;\"> 0.427 </td>\n <td style=\"text-align:center;\"> 0.853 </td>\n <td style=\"text-align:center;\"> 0.806 </td>\n <td style=\"text-align:center;\"> 0.634 </td>\n <td style=\"text-align:center;\"> 0.250 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.031 </td>\n <td style=\"text-align:center;\"> 0.878 </td>\n <td style=\"text-align:center;\"> 0.377 </td>\n <td style=\"text-align:center;\"> 0.635 </td>\n <td style=\"text-align:center;\"> 0.989 </td>\n <td style=\"text-align:center;\"> 0.540 </td>\n <td style=\"text-align:center;\"> 0.569 </td>\n <td style=\"text-align:center;\"> 0.374 </td>\n <td style=\"text-align:center;\"> 0.686 </td>\n <td style=\"text-align:center;\"> 0.299 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> gp41 </td>\n <td style=\"text-align:center;\"> 0.0 </td>\n <td style=\"text-align:center;\"> 0.040 </td>\n <td style=\"text-align:center;\"> 0.894 </td>\n <td style=\"text-align:center;\"> 0.304 </td>\n <td style=\"text-align:center;\"> 0.967 </td>\n <td style=\"text-align:center;\"> 0.442 </td>\n <td style=\"text-align:center;\"> 0.231 </td>\n <td style=\"text-align:center;\"> 0.504 </td>\n <td style=\"text-align:center;\"> 0.595 </td>\n <td style=\"text-align:center;\"> 0.856 </td>\n <td style=\"text-align:center;\"> 0.509 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.057 </td>\n <td style=\"text-align:center;\"> 0.274 </td>\n <td style=\"text-align:center;\"> 0.226 </td>\n <td style=\"text-align:center;\"> 0.891 </td>\n <td style=\"text-align:center;\"> 0.858 </td>\n <td style=\"text-align:center;\"> 0.165 </td>\n <td style=\"text-align:center;\"> 0.601 </td>\n <td style=\"text-align:center;\"> 0.426 </td>\n <td style=\"text-align:center;\"> 0.212 </td>\n <td style=\"text-align:center;\"> 0.606 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.779 </td>\n <td style=\"text-align:center;\"> 0.310 </td>\n <td style=\"text-align:center;\"> 0.443 </td>\n <td style=\"text-align:center;\"> 0.228 </td>\n <td style=\"text-align:center;\"> 0.680 </td>\n <td style=\"text-align:center;\"> 0.268 </td>\n <td style=\"text-align:center;\"> 0.995 </td>\n <td style=\"text-align:center;\"> 0.039 </td>\n <td style=\"text-align:center;\"> 0.357 </td>\n <td style=\"text-align:center;\"> 0.054 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> gp70 B.CaseA V1-V2 </td>\n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.621 </td>\n <td style=\"text-align:center;\"> 0.853 </td>\n <td style=\"text-align:center;\"> 0.300 </td>\n <td style=\"text-align:center;\"> 0.458 </td>\n <td style=\"text-align:center;\"> 0.507 </td>\n <td style=\"text-align:center;\"> 0.690 </td>\n <td style=\"text-align:center;\"> 0.506 </td>\n <td style=\"text-align:center;\"> 0.161 </td>\n <td style=\"text-align:center;\"> 0.039 </td>\n <td style=\"text-align:center;\"> 0.880 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.037 </td>\n <td style=\"text-align:center;\"> 0.665 </td>\n <td style=\"text-align:center;\"> 0.808 </td>\n <td style=\"text-align:center;\"> 0.447 </td>\n <td style=\"text-align:center;\"> 0.318 </td>\n <td style=\"text-align:center;\"> 0.296 </td>\n <td style=\"text-align:center;\"> 0.743 </td>\n <td style=\"text-align:center;\"> 0.143 </td>\n <td style=\"text-align:center;\"> 0.403 </td>\n <td style=\"text-align:center;\"> 0.276 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> p24 </td>\n <td style=\"text-align:center;\"> 0.0 </td>\n <td style=\"text-align:center;\"> 0.451 </td>\n <td style=\"text-align:center;\"> 0.231 </td>\n <td style=\"text-align:center;\"> 0.986 </td>\n <td style=\"text-align:center;\"> 0.088 </td>\n <td style=\"text-align:center;\"> 0.149 </td>\n <td style=\"text-align:center;\"> 0.051 </td>\n <td style=\"text-align:center;\"> 0.882 </td>\n <td style=\"text-align:center;\"> 0.507 </td>\n <td style=\"text-align:center;\"> 0.384 </td>\n <td style=\"text-align:center;\"> 0.439 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.404 </td>\n <td style=\"text-align:center;\"> 0.828 </td>\n <td style=\"text-align:center;\"> 0.243 </td>\n <td style=\"text-align:center;\"> 0.250 </td>\n <td style=\"text-align:center;\"> 0.689 </td>\n <td style=\"text-align:center;\"> 0.082 </td>\n <td style=\"text-align:center;\"> 0.180 </td>\n <td style=\"text-align:center;\"> 0.846 </td>\n <td style=\"text-align:center;\"> 0.360 </td>\n <td style=\"text-align:center;\"> 0.101 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.182 </td>\n <td style=\"text-align:center;\"> 0.721 </td>\n <td style=\"text-align:center;\"> 0.237 </td>\n <td style=\"text-align:center;\"> 0.676 </td>\n <td style=\"text-align:center;\"> 0.933 </td>\n <td style=\"text-align:center;\"> 0.371 </td>\n <td style=\"text-align:center;\"> 0.042 </td>\n <td style=\"text-align:center;\"> 0.423 </td>\n <td style=\"text-align:center;\"> 0.716 </td>\n <td style=\"text-align:center;\"> 0.147 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> ZM96.gp140 </td>\n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.030 </td>\n <td style=\"text-align:center;\"> 0.631 </td>\n <td style=\"text-align:center;\"> 0.244 </td>\n <td style=\"text-align:center;\"> 0.750 </td>\n <td style=\"text-align:center;\"> 0.234 </td>\n <td style=\"text-align:center;\"> 0.949 </td>\n <td style=\"text-align:center;\"> 0.890 </td>\n <td style=\"text-align:center;\"> 0.994 </td>\n <td style=\"text-align:center;\"> 0.090 </td>\n <td style=\"text-align:center;\"> 0.504 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.186 </td>\n <td style=\"text-align:center;\"> 0.353 </td>\n <td style=\"text-align:center;\"> 0.286 </td>\n <td style=\"text-align:center;\"> 0.722 </td>\n <td style=\"text-align:center;\"> 0.889 </td>\n <td style=\"text-align:center;\"> 0.409 </td>\n <td style=\"text-align:center;\"> 0.459 </td>\n <td style=\"text-align:center;\"> 0.376 </td>\n <td style=\"text-align:center;\"> 0.021 </td>\n <td style=\"text-align:center;\"> 0.189 </td>\n </tr>\n <tr>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> CD4+ </td>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> Any ENV PTEG </td>\n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.237 </td>\n <td style=\"text-align:center;\"> 0.555 </td>\n <td style=\"text-align:center;\"> 0.771 </td>\n <td style=\"text-align:center;\"> 0.108 </td>\n <td style=\"text-align:center;\"> 0.237 </td>\n <td style=\"text-align:center;\"> 0.104 </td>\n <td style=\"text-align:center;\"> 0.717 </td>\n <td style=\"text-align:center;\"> 0.084 </td>\n <td style=\"text-align:center;\"> 0.627 </td>\n <td style=\"text-align:center;\"> 0.319 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.248 </td>\n <td style=\"text-align:center;\"> 0.565 </td>\n <td style=\"text-align:center;\"> 0.434 </td>\n <td style=\"text-align:center;\"> 0.103 </td>\n <td style=\"text-align:center;\"> 0.117 </td>\n <td style=\"text-align:center;\"> 0.856 </td>\n <td style=\"text-align:center;\"> 0.861 </td>\n <td style=\"text-align:center;\"> 0.128 </td>\n <td style=\"text-align:center;\"> 0.773 </td>\n <td style=\"text-align:center;\"> 0.723 </td>\n </tr>\n</tbody>\n</table>"
allMDS.html %>% cat(file = 'tables/allMDS.html')
allMDS %>%
kable("latex", escape = F, digits = 3, align = 'c', booktabs = T) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass -> allMDS.latex
allMDS.latex %>% cat(file = 'tables/allMDS.tex')
# Print latex table to tex file
cat(docHead, allMDS.latex, docTail, file = 'tables/allMDS.tex')
write_csv(allMDS, 'tables/allMDSGlmPValues.csv')
at each taxonomic level
# How many taxa do we see if we agglomerate at different levels
psN2 %>% tax_table %>% as.data.frame %>% dplyr::select(Phylum:Genus) %>% colnames -> taxLevels
data_frame(taxLevels) %>%
mutate(ntaxa = map(taxLevels,
function(lev){
psN2 %>% tax_glom(lev) %>% ntaxa
}
)) %>%
mutate(ntaxa = unlist(ntaxa)) %>%
pass -> NTaxaAtLevel
NTaxaAtLevel
data_frame(taxLevels = "Species", ntaxa = ntaxa(psN2), ps = list((psN2))) -> specRow
data_frame(taxLevels = "Species", ntaxa = ntaxa(psN1), psCount = list((psN1))) -> specRowC
D2K_savename <- function(distmat){
# cascade names forward with the D2K operation
require(MiRKAT)
out <- MiRKAT::D2K(distmat)
colnames(out) <- colnames(distmat)
rownames(out) <- rownames(distmat)
out
}
# Data frame of phyloseq objects distances and kernels at a bunch of taxonomic levels
NTaxaAtLevel %>%
mutate(ps = map(ntaxa, ~tip_glom_saveid(psN2, k = .))) %>%
# process the phyloseq objects so they have better names
mutate(ps = map(ps, ~swap.phyloseq.taxnames(tag_phyloseq(remove_tag_phyloseq(.)), oldname = 'oldname2'))) %>%
# add in the species data row (which should already have correct names)
bind_rows(specRow) %>%
# calculate jensen-shannon distance matrix
mutate(jsd = map(ps, ~phyloseq::distance(., method = "jsd") )) %>%
# convert to 2d matrix
mutate(jsdMat = map(jsd, ~as.matrix(.))) %>%
# calculate kernel
mutate(kjsd = map(jsdMat, ~D2K_savename(.))) -> tmp
Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.
tmp %>%
mutate(psNoZero = map(ps, ~transform_sample_counts(., function(x) x+(1/1000)))) -> tmp
tmp %>%
## chemometrics::clr just works, while compositions::clr throws a criptic error message here
mutate(clr = map(psNoZero, ~ transform_otu_table(., chemometrics::clr))) %>%
#mutate(clr = map(psNoZero, ~ transform_otu_table(., function(x) as.matrix(compositions::clr(x))))) %>%
pass -> psDf0 # Original way
# Data frame of phyloseq objects distances and kernels at a bunch of taxonomic levels
# I use psN1 because I need count data for some downstream steps.
NTaxaAtLevel %>%
mutate(psCount = map(ntaxa, ~tip_glom_saveid(psN1, k = .))) %>%
# process the phyloseq objects so they have better names
mutate(psCount = map(psCount, ~swap.phyloseq.taxnames(tag_phyloseq(remove_tag_phyloseq(.)), oldname = 'oldname2'))) %>%
# add in the species data row (which should already have correct names)
bind_rows(specRowC) %>%
pass -> tmp
Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.
tmp %>%
# calculate jensen-shannon distance matrix
mutate(ps = map(psCount, ~transform_sample_counts(., function(x) {x/sum(x)}))) %>%
mutate(jsd = map(ps, ~phyloseq::distance(., method = "jsd") )) %>%
# convert to 2d matrix
mutate(jsdMat = map(jsd, ~as.matrix(.))) %>%
# calculate kernel
mutate(kjsd = map(jsdMat, ~D2K_savename(.))) -> tmp
tmp %>%
mutate(psNoZero = map(ps, ~transform_sample_counts(., function(x) x+(1/1000)))) %>%
## chemometrics::clr just works, while compositions::clr throws a criptic error message here
mutate(clr = map(psNoZero, ~ transform_otu_table(., chemometrics::clr))) %>%
#mutate(clr = map(psNoZero, ~ transform_otu_table(., function(x) as.matrix(compositions::clr(x))))) %>%
pass -> psDf
print(psDf)
MirMulti <- function(x, KsDf = psDf, ps = psN2, nperm = 9999){
Ks = KsDf$kjsd
# I bind to the phyloseq object and then peel off again later to guerentee
# that the y-data is in the same order as the Ks
locPS <- phylo_join(ps, x, by = 'pub_id')
ydata0 <- sample_data(locPS)$mag
yna <- is.na(ydata0)
ydata <- ydata0[!yna]
loc.Ks <- lapply(Ks, function(K){K[!yna, !yna]})
bcxJSD <- MiRKAT(y = jac_box_cox_2(ydata), Ks = loc.Ks, out_type = "C", method = 'permutation', nperm = nperm)
medJSD <- MiRKAT(y = medcode(ydata), Ks = loc.Ks, out_type = "D", method = 'permutation', nperm = nperm)
mmDf = data.frame(
taxLevels = KsDf$taxLevels,
ntaxa = KsDf$ntaxa,
bcxJSD = bcxJSD$indivP, medJSD = medJSD$indivP,
bcxJSDOmni = bcxJSD$omnibus_p, medJSDOmni = medJSD$omnibus_p)
mmDf
}
# Test case
use.immune %>%
filter(type == 'IgG' & antigen == 'gp41'& visitno == 2 & ct == 'T') -> test.immune1
test.mm <- MirMulti(test.immune1, Ks = psDf, nperm = 999)
test.mm
ptm = proc.time()
use.immune %>%
group_by(type, antigen, month) %>%
nest %>%
mutate(mir = map(data,
~MirMulti(., Ks = psDf, ps = psN2, nperm = 999)
)) %>%
dplyr::select(-data) %>% unnest(mir) %>%
pass -> mirLevels
proc.time() - ptm
user system elapsed
2.65 0.02 2.67
mirLevels %>% dplyr::select(-ntaxa, -medJSD) %>% spread(key = taxLevels, value = bcxJSD)
mirLevels %>% dplyr::select(-ntaxa, -bcxJSD) %>% spread(key = taxLevels, value = medJSD)
I’d like to combine the above into one table, when it isn’t 7:45. Probably has soemething to do with merging columns or something. Or maybe I just want to plot it as a figure.
mirLevels %>%
gather(metric, P, bcxJSD:medJSD) -> mirDat
mirDat %>% dplyr::select(type:month, bcxJSD = bcxJSDOmni, medJSD = medJSDOmni) %>%
group_by(type, antigen, month) %>%
summarize(bcxJSD = mean(bcxJSD), medJSD = mean(medJSD)) %>%
gather(metric, P, bcxJSD, medJSD) -> mirOmni
mirOmni
NTaxaAtLevel %>% bind_rows(specRow[,1:2]) %>% unite(nLev, taxLevels, ntaxa, remove = FALSE) -> NTaxaAtLevel2
NTaxaAtLevel2
bind_rows(
permKernTable %>% mutate(metric = 'med'),
permKernTableGaus %>% mutate(metric = 'bcx')
) %>%
dplyr::select(type, antigen, month, metric, mir.P) %>%
pass -> WufPData
fixant <- function(df){
df %>%
mutate(antigen = stringr::str_replace_all(antigen, "\\.", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
#mutate(metric = stringr::str_replace_all(metric, "bcx", "")) %>%
pass
}
fixstuff <- function(df){
df %>%
fixant %>%
mutate(metric = stringr::str_replace_all(metric, "JSD", "")) %>%
pass
}
mirDat %>% fixant %>% head
mirDat %>%
#mutate(antigen = stringr::str_replace_all(antigen, "\\.", " ")) %>%
fixstuff %>%
ggplot(aes(x = ntaxa, y = P, col = factor(month), fill = factor(month))) +
geom_point(pch = 21) +
facet_grid(type + antigen ~ metric, labeller = labeller(antigen = label_wrap_gen(width = 10))) +
scale_x_log10(breaks = c(3, NTaxaAtLevel2$ntaxa, 1000), labels = c("omni", NTaxaAtLevel2$nLev, "wunifrac")) +
scale_y_log10(breaks = c(0.002, 0.01, 0.05, 0.2, 1)) +
geom_hline(yintercept=0.05, col = 'blue', alpha = 0.5) + geom_hline(yintercept=0.01, col = 'red', alpha = 0.5) +
#geom_hline(data = mirOmni, aes(yintercept = P, col = factor(month))) +
#annotation_logticks(sides = 'bl') +
#geom_rug(data = mirOmni, aes(y = P, col = factor(month)), inherit.aes = F) +
geom_point(data = mirOmni %>% ungroup %>% fixstuff,
aes(x = 3, y = P, col = factor(month), fill = factor(month)), inherit.aes = F, pch = 22, size = 2) +
geom_point(data = WufPData %>% ungroup %>% fixant,
aes(x = 1000, y = mir.P, col = factor(month), fill = factor(month)), inherit.aes = F, pch = 24, size = 2) +
scale_colour_manual(values=cbPalette) +
scale_fill_manual(values=alpha(cbPalette, 0.5)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) -> pjsd0
options(repr.plot.width=8, repr.plot.height= 6)
pjsd0
options(par0)
# I'd like to add weighted unifrac as a tick mark on the right.
NTaxaAtLevel2
# New combined data frame that has omnibus, regular, and wunifrac all in one
bind_rows(
mirDat %>% mutate(metric = stringr::str_replace_all(metric, "JSD", "")) %>%
mutate(test = "JSD"),
mirOmni %>% ungroup %>% mutate(metric = stringr::str_replace_all(metric, "JSD", "")) %>%
mutate(taxLevels = "Omnibus") %>% mutate(test = "Omnibus"),
WufPData %>% ungroup %>% dplyr::rename(P = mir.P) %>%
mutate(taxLevels = "WUnifrac") %>% mutate(test = "WUnifrac")
) %>%
mutate(antigen = factor(antigen, levels = c(ants2, ants1, "ANY.ENV.PTEG"))) %>%
mutate(type = factor(type, levels = c("IgA", "IgG", "CD4+"))) %>%
mutate(taxLevels = factor(taxLevels, levels = c("Omnibus", NTaxaAtLevel2$taxLevels, "WUnifrac"))) %>%
dplyr::select(-c(bcxJSDOmni:medJSDOmni))%>%
unite(nLev, taxLevels, ntaxa, remove = FALSE) %>%
mutate(nLev = stringr::str_replace(nLev, "_NA", "")) %>%
#mutate(antigen = stringr::str_replace_all(antigen, "\\.", " ")) %>%
#mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = factor(antigen, labels = stringr::str_replace_all(levels(antigen), "\\.", " "))) %>%
mutate(antigen = factor(antigen, labels = stringr::str_replace_all(levels(antigen), "_", " "))) %>%
# mutate(antigen = factor(antigen, labels = (levels(antigen)))) %>%
# mutate(antigen = factor(antigen, labels = (levels(antigen)))) %>%
mutate(test = factor(test, levels = c('Omnibus', 'JSD', 'WUnifrac'))) %>%
pass -> mirDat2
binding factor and character vector, coercing into character vectorbinding character and factor vector, coercing into character vector
mirDat2 %>% filter(type == 'CD4+')
mirDat2 %>% head
mirDat2 %>%
filter(type == "IgG") %>%
ggplot(aes(x = taxLevels, y = P, col = factor(month), fill = factor(month), shape = test)) +
geom_point(size = 2) +
facet_grid(type + antigen ~ metric, labeller = labeller(antigen = label_wrap_gen(width = 10))) +
#scale_x_log10(breaks = c(3, NTaxaAtLevel2$ntaxa, 1000), labels = c("omni", NTaxaAtLevel2$nLev, "wunifrac")) +
scale_y_log10(breaks = c(0.002, 0.01, 0.05, 0.2, 1)) +
geom_hline(yintercept=0.05, col = 'blue', alpha = 0.5) + geom_hline(yintercept=0.01, col = 'red', alpha = 0.5) +
scale_shape_manual(values = c(22, 21, 24)) +
scale_colour_manual(values=cbPalette) +
scale_fill_manual(values=alpha(cbPalette, 0.5)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) -> pjsd
options(repr.plot.width=6, repr.plot.height= 8)
pjsd
ggsave('figures/KernelPVsLevel.png', width = 6, height = 8)
options(par0)
X-axis is now spaced evenly
Table SX. P values of kernel regression tests. Circles indicate jensen shannon values at different taxonomic resolutions. Squares are the omnibus p-value for that cohort of tests. Triangles indicate kernel regression p-values for the corresponding weighted unifrac test.
The blue and red lines indicate p values of 5% and 1% respectively.
Observations: The weighted unifrac test is sensitive. In cases where only one taxonic level hits, weighted unifrac often also falls at some statistically significant value. The omnibus p value is often higher than the weighted unifrac one. Weighted unifrac seems like a good test for identifying patterns at any level that relate to an outcome. The jensen shannon informs us about which level the pattern is observed.
I might even be able to drill down to every level.
model_each_species <- function(ps, f, pthresh = 1, q = FALSE){
# Start with the otu table
ps %>%
# reshape it so we have clr values for every taxon-sample pair
otu_table %>% as.data.frame %>% rownames_to_column("Sample") %>% gather(Taxon, clr, -Sample) %>%
# bind that to the sample data
# doing this here seems remarkably inefficient, but its not creating a bottleneck so I'll leave it.
left_join(
ps %>%
# the sample data need to have MDS1 and MDS2 appended to them
phylo_join(
psN2.pcoa %>% scores(display = "sites") %>% # hardcoded psN2.pcoa
as.data.frame %>%
rownames_to_column %>%
dplyr::select('rowname', 'MDS1', 'MDS2'),
by = 'rowname'
) %>%
# back to binding to sample data
sample_data %>% as('data.frame') %>% rownames_to_column("Sample"),
by = 'Sample') %>%
group_by(Taxon) %>% # group and nest for model run
nest %>%
mutate(Mod = map(data, f)) %>% # apply model over each species
mutate(Glance = map(Mod, glance), Tidy = map(Mod, tidy)) %>% # extract relevant data from model
# view model
dplyr::select(Taxon, Tidy) %>% unnest %>%
mutate(term = gsub('[\\( \\)]','', term)) %>% # remove parentheses from "(Intercept)"
gather(meas, val, estimate:p.value) %>%
unite(meas, term, meas) %>% spread(meas, val) %>% arrange(clr_estimate) %>%
dplyr::select(Taxon, Intercept_estimate, clr_estimate, clr_std.error, clr_p.value) %>%
# add q value
{if(q) mutate(., clr_q.value = p2q(clr_p.value)) else .} %>%
filter(clr_p.value < pthresh) %>%
#Join taxonomy information
left_join(
as.data.frame(ps@tax_table@.Data) %>% as.tibble %>% dplyr::select(Kingdom:Genus, Species, tag) %>%
mutate(tag = as.character(tag)), # mutate so tag is and taxon are both character class
by = c("Taxon" = "tag")) %>%
pass
}
model_each_species_for_antigen <- function(antigen, ps = psN2){
ps %>%
model_each_species(function(df){glm(medcode2(get(antigen)) ~ clr, data = df, family = 'binomial')}, q = TRUE, pthresh = 1)
}
ColsToRun <- c('IgG_Con.6.gp120.B_Month_6.5', 'IgG_Con.6.gp120.B_Month_12', 'IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_gp70_B.CaseA_V1_V2_Month_12', 'IgG_ZM96.gp140_Month_6.5', 'MDS1', 'isMale' )
model_each_species_case <- function(ps){
ps %>% model_each_species(function(df){glm(MDS1 ~ clr, data = df, family = 'gaussian')}, q = TRUE, pthresh = 1) %>%
arrange(clr_estimate) %>%
mutate(Taxon = factor(Taxon, levels = Taxon[order(clr_estimate)])) %>%
mutate(test = 'gaussian', antigen = 'MDS1') %>%
pass -> loc_mds1Glms
ps %>% model_each_species(function(df){glm(log10(IgG_Con.6.gp120.B_Month_12 + 100) ~ clr, data = df, family = 'gaussian')}, q = TRUE, pthresh = 1) %>%
arrange(clr_estimate) %>%
mutate(Taxon = factor(Taxon, levels = levels(loc_mds1Glms$Taxon))) %>%
mutate(test = 'gaussian', antigen = 'Con.6.gp120.B_Month_12') %>%
pass -> loc_gp120Glms
tibble(antigen = ColsToRun) %>% mutate(model = map(antigen, ~model_each_species_for_antigen(., ps = ps))) %>%
unnest %>% mutate(Taxon = factor(Taxon, levels = levels(loc_mds1Glms$Taxon))) %>%
mutate(test = 'binomial') %>%
pass-> loc_logitCoefs
#list(loc_mds1Glms, loc_gp120Glms, loc_logitCoefs)
bind_rows(loc_mds1Glms, loc_gp120Glms, loc_logitCoefs) %>% dplyr::select(test, antigen, everything())
}
#psDf %>% mutate(ps2 = map(ps, ~swap.phyloseq.taxnames(tag_phyloseq(remove_tag_phyloseq(.))))) -> test
#psDf[[1,"clr"]] %>% tax_table
psDf[[1]]
[1] "Phylum" "Class" "Order" "Family" "Genus" "Species"
psDf %>% print
ptm = proc.time()
psDf %>% dplyr::select(taxLevels, ntaxa, clr) %>% mutate(localmod = map(clr, model_each_species_case)) ->psDfLoc
glm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurred
proc.time() - ptm
user system elapsed
43.427 0.208 43.569
print(psDfLoc)
psDfLoc %>% dplyr::select(-clr) %>% unnest(localmod) -> tmp
Unequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorUnequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorUnequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorUnequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorUnequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorUnequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorUnequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vector
psDfLoc$taxLevels
[1] "Phylum" "Class" "Order" "Family" "Genus" "Species"
tmp %>% mutate(taxLevels = factor(taxLevels, levels = psDfLoc$taxLevels)) -> LocalTests
LocalTests %>%
filter(antigen != "MDS1") %>%
write_csv("tables/AllLocalTests.csv")
LocalTests %>% head
options(repr.plot.width=6, repr.plot.height=8)
LocalTests %>%
filter(antigen != "MDS1") %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " IgG ", "")) %>%
ggplot(aes(x = factor(taxLevels), y = clr_q.value)) + geom_point(size = 0.1) +
geom_hline(yintercept = 0.2, color = 'grey') +
geom_hline(yintercept = 0.05, color = 'blue') + geom_hline(yintercept = 0.01, color = 'green') +
facet_wrap(~antigen + test) + scale_y_log10() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
ggsave('figures/LocalQEveryLevel.png')
Saving 7.29 x 4.5 in image
options(repr.plot.width=6, repr.plot.height=8)
LocalTests %>%
filter(antigen != "MDS1") %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " IgG ", "")) %>%
ggplot(aes(x = factor(taxLevels), y = clr_p.value)) + geom_point(size = 0.1) +
geom_hline(yintercept = 0.2, color = 'grey') +
geom_hline(yintercept = 0.05, color = 'blue') + geom_hline(yintercept = 0.01, color = 'green') +
facet_wrap(~antigen + test) +# scale_y_log10() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
ggsave('figures/LocalPEveryLevel.png')
Saving 7.29 x 4.5 in image
options(repr.plot.width=6, repr.plot.height=8)
LocalTests %>%
filter(antigen != "MDS1") %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " IgG ", "")) %>%
ggplot(aes(x = factor(taxLevels), y = clr_p.value)) + geom_point(size = 0.1) +
geom_hline(yintercept = 0.2, color = 'grey') +
geom_hline(yintercept = 0.05, color = 'blue') + geom_hline(yintercept = 0.01, color = 'green') +
facet_wrap(~antigen + test) + scale_y_log10() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
ggsave('figures/LocalPEveryLevel_LogScale.png')
Saving 7.29 x 4.5 in image
order_taxa_by_mds1 <- function(df){
# this has to be a model_each_species type of data frame
df %>% filter(antigen == 'MDS1' & test == 'gaussian') %>%
mutate(TaxonF = factor(Taxon, levels = Taxon[order(clr_estimate)])) -> mds1df
df %>% mutate(TaxonF = factor(Taxon, levels = levels(mds1df$TaxonF)))
}
To my annoyance, everything is labeled with IgG except gp120_12
LocalTests %>%
filter(taxLevels == 'Family') %>%
order_taxa_by_mds1 %>%
filter(
(antigen %in% c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5'))|
(antigen == 'Con.6.gp120.B_Month_12' & test == 'gaussian')
) %>%
mutate(antigen = factor(antigen,
levels = c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5',
'IgG_Con.6.gp120.B_Month_6.5', 'Con.6.gp120.B_Month_12'))) %>%
filter(clr_p.value < 0.05 & clr_q.value < 0.2) %>%
dplyr::select(antigen:clr_estimate) %>%
dplyr::select(-Intercept_estimate) %>%
mutate(cordir = sign(clr_estimate)) %>%
pass
LocalTests %>% pull(antigen) %>% unique
[1] "MDS1" "Con.6.gp120.B_Month_12" "IgG_Con.6.gp120.B_Month_6.5"
[4] "IgG_Con.6.gp120.B_Month_12" "IgG_gp41_Month_0" "IgG_gp41_Month_6.5"
[7] "IgG_gp70_B.CaseA_V1_V2_Month_12" "IgG_ZM96.gp140_Month_6.5" "isMale"
# Family Hits
LocalTests %>%
filter(taxLevels == 'Family') %>%
order_taxa_by_mds1 %>%
filter(
(antigen %in% c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5',
'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'))|
(antigen == 'Con.6.gp120.B_Month_12' & test == 'gaussian')
) %>%
mutate(antigen = factor(antigen, levels = c(
'IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5', 'Con.6.gp120.B_Month_12',
'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'
))) %>%
pass -> tmp
#tmp$antigen %>% unique
tmp %>% filter(clr_p.value < 0.05 & clr_q.value < 0.2) %>%
pull(Taxon) %>% unique -> useFamily
tmp %>% filter(Taxon %in% useFamily) %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, "IgG ", "")) %>%
ggplot(aes(x = TaxonF, y = clr_estimate,
color = (clr_p.value < 0.05), shape =(clr_q.value < 0.2))) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = clr_estimate - 2*clr_std.error, ymax = clr_estimate + 2*clr_std.error)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + geom_hline(yintercept = 0) +
facet_wrap(~antigen, ncol = 1, scales = 'free_y') + xlab("Family Level Taxon")
# Show the censored ones accross - so this would be everything with at least one hit
# but also show what they are in all cases.
ggsave('figures/anyFamilyIgg.png', width = 6, height = 8)
I think its worth digging into clostridia and Prophyromonidaceae with stacked bars
Family level
Lets come back to this after we’ve done the local tests. Since we need them to color code the axes.
psDf %>% print
psDf %>% filter(taxLevels == 'Family') %>% dplyr::select(ps) %>% pull %>%.[[1]]
phyloseq-class experiment-level object
otu_table() OTU Table: [ 39 taxa and 21 samples ]
sample_data() Sample Data: [ 21 samples by 35 sample variables ]
tax_table() Taxonomy Table: [ 39 taxa by 12 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 39 tips and 38 internal nodes ]
print(psDf)
#nFamilyTaxa <- NTaxaAtLevel %>% filter(taxLevels == 'Family') %>% pull(ntaxa)
psDf %>% filter(taxLevels == 'Family') %>% dplyr::select(psNoZero) %>% pull %>%.[[1]] %>%
otu_table %>% as.data.frame %>%
pass -> myRel
ptm = proc.time()
phiBoot <- boot(data = myRel, statistic = boot_phi, R = 1000)
proc.time() - ptm
user system elapsed
7.171 0.023 7.192
ptm = proc.time()
tidyCI <- unwarn(
tidy(phiBoot,conf.int=TRUE,conf.method="bca")
)
proc.time() - ptm
user system elapsed
12.857 0.076 12.900
myRel %>% make_proportionality_matrix %>%
as.data.frame %>%
rownames_to_column("TaxonX") %>% gather(TaxonY, phi, -TaxonX) %>%
filter(TaxonX != TaxonY) %>% data.frame(tidyCI) -> namedTidyCI
head(LocalTests)
LocalTests %>% filter(test == 'gaussian' &
antigen == 'MDS1' &
clr_p.value <0.05 &
clr_q.value < 0.2&
taxLevels == "Family") -> tmp
tmp %>% pull(Taxon) -> MDS1Fam
tmp %>% filter(clr_estimate < 0) %>% pull(Taxon) -> lowMDS1Fam
tmp %>% filter(clr_estimate >= 0) %>% pull(Taxon) -> highMDS1Fam
I’d like to do this, but for gp41 baseline and gp120 as well.
useFamily
[1] "Bacteroidetes.4" "Firmicutes.4" "Firmicutes.2" "Clostridia" "Bacteroides"
[6] "Firmicutes.5" "Porphyromonadaceae.1" "Porphyromonadaceae" "Bacteroidetes.1" "Anaerococcus"
LocalTests %>% head
reshape2::melt
function (data, ..., na.rm = FALSE, value.name = "value")
{
UseMethod("melt", data)
}
<bytecode: 0x562cc04c4148>
<environment: namespace:reshape2>
targStat <- "phi"
namedTidyCI %>% dplyr::select(TaxonX, TaxonY, targStat) %>% spread(key = TaxonY, value = targStat) %>%
remove_rownames %>% column_to_rownames("TaxonX") %>%
as.dist %>% as.matrix -> phidata
phi_dd <- as.dist(phidata)
phi_hc <- hclust(phi_dd)
phidata %>%
#.[phi_hc$order, phi_hc$order] %>% # this way also worked just fine
reshape2::melt() %>%
mutate(Var1 = factor(Var1, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
mutate(Var2 = factor(Var2, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass-> tmp
p_phi_1 <- ggplot(tmp, aes(Var1, Var2, fill =(value))) +
geom_tile() +
scale_fill_gradient(high = "grey90", low = "red",
space = "Lab",
name="phi",
limits = c(NA, 3), na.value = "white") +
# theme_minimal()+ # minimal theme
theme(axis.text.x = element_text(
angle = 90, vjust = 1, size = 7, hjust = 1,
face = ifelse(levels(tmp$Var1) %in% useFamily, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% useFamily, "black", "grey30")
),
axis.text.y = element_text(
size = 7,
face = ifelse(levels(tmp$Var1) %in% MDS1Fam, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% lowMDS1Fam, "red",
ifelse(levels(tmp$Var1) %in% highMDS1Fam, "blue", "grey30"))
))+
coord_fixed() +
labs(x = "Family Any Igg",y = "Family MDS1 (red-low, blue-high)" ) +
# rectangles around the three clusters, positioned by eye
geom_rect(aes(xmin = 0 + 0.5, xmax = 10 - 0.5, ymin = 0 + 0.5, ymax = 10 - 0.5),
fill = "transparent", color = "gray20", size = 1.5) +
geom_rect(aes(xmin = 13 + 0.5, xmax = 24 - 0.5, ymin = 13 + 0.5, ymax = 24 - 0.5),
fill = "transparent", color = "gray20", size = 1.5) +
geom_rect(aes(xmin = 23 + 0.5, xmax = 37 - 0.5, ymin = 23 + 0.5, ymax = 37 - 0.5),
fill = "transparent", color = "gray20", size = 1.5)
p_phi_1
# ggsave("figures/phi_vs_mds1_and_igg.png", p_phi_1, width = 6, height = 6)
LocalTests %>%
filter(taxLevels == 'Family') %>%
order_taxa_by_mds1 %>%
filter(
(antigen %in% c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5',
'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'))|
(antigen == 'Con.6.gp120.B_Month_12' & test == 'gaussian')
) %>%
mutate(antigen = factor(antigen, levels = c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5', 'Con.6.gp120.B_Month_12',
'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'))) %>%
pass -> tmp
tmp %>% dplyr::select(antigen, Taxon, clr_estimate, clr_p.value, clr_q.value) %>%
mutate(clr_sign = sign(clr_estimate)) %>%
mutate(isHit = ifelse(clr_p.value < 0.05 & clr_q.value < 0.2, 1, 0)) %>%
mutate(Taxon = factor(Taxon, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass -> chorddata
targStat <- "conf.low"
namedTidyCI %>% dplyr::select(TaxonX, TaxonY, targStat) %>% spread(key = TaxonY, value = targStat) %>%
remove_rownames %>% column_to_rownames("TaxonX") %>%
as.dist %>% as.matrix -> phidata
phi_dd <- as.dist(phidata)
phi_hc <- hclust(phi_dd)
phidata %>%
#.[phi_hc$order, phi_hc$order] %>% # this way also worked just fine
reshape2::melt() %>%
mutate(Var1 = factor(Var1, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
mutate(Var2 = factor(Var2, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass-> tmp
ggplot(tmp, aes(Var1, Var2, fill =(value))) +
geom_tile() +
scale_fill_gradient(high = "grey90", low = "red",
space = "Lab",
name="phi",
limits = c(NA, 3), na.value = "white") +
# theme_minimal()+ # minimal theme
theme(axis.text.x = element_text(
angle = 90, vjust = 1, size = 7, hjust = 1,
face = ifelse(levels(tmp$Var1) %in% useFamily, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% useFamily, "black", "grey30")
),
axis.text.y = element_text(
size = 7,
face = ifelse(levels(tmp$Var1) %in% MDS1Fam, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% lowMDS1Fam, "red",
ifelse(levels(tmp$Var1) %in% highMDS1Fam, "blue", "grey30"))
))+
coord_fixed() +
labs(x = "Family Any Igg",y = "Family MDS1 (red-low, blue-high)" )-> p_phi_low
p_phi_low
# ggsave("figures/phi_vs_mds1_and_igg.png", p_phi_1, width = 6, height = 6)
targStat <- "conf.high"
namedTidyCI %>% dplyr::select(TaxonX, TaxonY, targStat) %>% spread(key = TaxonY, value = targStat) %>%
remove_rownames %>% column_to_rownames("TaxonX") %>%
as.dist %>% as.matrix -> phidata
phi_dd <- as.dist(phidata)
phi_hc <- hclust(phi_dd)
phidata %>%
#.[phi_hc$order, phi_hc$order] %>% # this way also worked just fine
reshape2::melt() %>%
mutate(Var1 = factor(Var1, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
mutate(Var2 = factor(Var2, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass-> tmp
ggplot(tmp, aes(Var1, Var2, fill =(value))) +
geom_tile() +
scale_fill_gradient(high = "grey90", low = "red",
space = "Lab",
name="phi",
limits = c(NA, 3), na.value = "white") +
# theme_minimal()+ # minimal theme
theme(axis.text.x = element_text(
angle = 90, vjust = 1, size = 7, hjust = 1,
face = ifelse(levels(tmp$Var1) %in% useFamily, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% useFamily, "black", "grey30")
),
axis.text.y = element_text(
size = 7,
face = ifelse(levels(tmp$Var1) %in% MDS1Fam, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% lowMDS1Fam, "red",
ifelse(levels(tmp$Var1) %in% highMDS1Fam, "blue", "grey30"))
))+
coord_fixed() +
labs(x = "Family Any Igg",y = "Family MDS1 (red-low, blue-high)" )-> p_phi_high
p_phi_high
# ggsave("figures/phi_vs_mds1_and_igg.png", p_phi_1, width = 6, height = 6)
chorddata %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, "IgG ", "")) %>%
ggplot(
aes(x = Taxon, y = antigen, alpha = factor(isHit), color = factor(clr_sign))) +
scale_alpha_discrete(range = c(0, 1)) +
guides(alpha = FALSE) +
theme_minimal() +
coord_fixed(ratio = 2) +
scale_colour_manual(values = c("red", "blue")) +
theme(axis.text.x = element_text(
angle = 90, vjust = 0.5, size = 7, hjust = 1,
face = ifelse(levels(chorddata$Taxon) %in% useFamily, "bold", "plain"),
colour = ifelse(levels(chorddata$Taxon) %in% useFamily, "black", "grey30")),
plot.margin = unit(c(0,3,1,3), "cm")
) +
#guides(col = TRUE) +
guides(color=guide_legend(title="Sign of GLM")) +
labs(x = "Family",y = "Antigen -- Month" ) +
geom_point() -> guitar_chords
Using alpha for a discrete variable is not advised.
par <- options()
options(repr.plot.width=10, repr.plot.height= 5)
guitar_chords
options(par)
p_phi_1a <- p_phi_1 +
theme(axis.text.x = element_blank(),
axis.title.x = element_blank(),
plot.margin = unit(c(1, 3, -5.5, 4), "cm"))
par <- options()
options(repr.plot.width=8, repr.plot.height= 8)
p_phi_cord <- cowplot::plot_grid(p_phi_1a, guitar_chords, nrow = 2, align = "v")
p_phi_cord
#phi_legend <- cowplot::get_legend(p_phi_1)
# cowplot::ggdraw(
# cowplot::plot_grid(
# cowplot::plot_grid(p_phi_1a, guitar_chords, ncol = 1, align = "v"),
# cowplot::plot_grid(phi_legend, NULL, ncol = 1),
# rel_widths = c(10,1)
# ))
ggsave('figures/phi_heatmap_withlegend.png', width = 10, height = 10)
options(par)
# More color-blind friendly colorbalettes
#http://colorbrewer2.org/#type=qualitative&scheme=Paired&n=10
cb10 <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a')
cb12 <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a','#ffff99','#b15928')
# Less color-blind friendly, but still nice.
#https://sashat.me/2017/01/11/list-of-20-simple-distinct-colors/
trub20 <- c('#e6194b','#3cb44b','#ffe119','#0082c8','#f58231','#911eb4','#46f0f0','#f032e6','#d2f53c','#fabebe','#008080','#e6beff','#aa6e28','#fffac8','#800000','#aaffc3','#808000','#ffd8b1','#000080','#808080','#FFFFFF','#000000')
options(repr.plot.width=8, repr.plot.height= 4)
Bookmark Here. Stuck for strange reasons.
ordered_pub_id_df <- psN2 %>% sample_data %>% dplyr::select(pub_id, rMDS1, newname, MDS1) %>% arrange(rMDS1) %>% mutate(MDS1 = round(MDS1, 2), fig3tick = paste(pub_id, MDS1,sep = "_"))
Setting class(x) to multiple strings ("tbl_df", "tbl", ...); result will no longer be an S4 object
ordered_pub_id_df
fig3tick <- ordered_pub_id_df %>% pull(fig3tick)
p_phy <- plot_bar(psN2, x = 'newname', fill = 'Phylum') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("All Phyla")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_phy
#ggsave('plots/Phyla_by_wuf1.png')
p_firm <- subset_taxa(psN2, Phylum == 'Firmicutes') %>%
plot_bar( x = 'newname', fill = 'Order') +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) +
scale_fill_manual(values = cb10) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_firm
#ggsave('plots/MostFirmicutesAreClostridiales.png')
p_clostridia <- subset_taxa(psN2, Class == 'Clostridia') %>%
plot_bar( x = 'newname', fill = 'Family') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("Families of Order Clostridiales (All Class Clostridia)")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_clostridia
# p_porph <- subset_taxa(psN2, Family == 'Porphyromonadaceae') %>%
# plot_bar( x = 'newname', fill = 'Genus') + scale_fill_manual(values = cb10) #+ theme_bw()
# p_porph
# p_bact <- subset_taxa(psN2, Phylum == 'Bacteroidetes') %>% # all class (Bacteroidia), order (Bacteroidales)
# plot_bar(x = 'newname', fill = 'Family') + scale_fill_manual(values = cb10) #+ theme_bw()
# p
# ggsave('figures/Bacteroidetes_Families.png')
p_ClosXI <- subset_taxa(psN2, Family == 'Clostridiales_Incertae_Sedis_XI') %>%
plot_bar( x = 'newname', fill = 'Genus') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("Genera of Family Clostridiales_Incertae_Sedis_XI")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_ClosXI
#ggsave('figures/Clostridiales_Incertae_Sedis_XI_Genus.png')
p_Bact <- subset_taxa(psN2, Phylum == 'Bacteroidetes') %>%
plot_bar( x = 'newname', fill = 'Family') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("Families of Class Bacteroidetes (All Order Bacteroidiales)")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_Bact
#ggsave('figures/Bacteroides.png')
options(repr.plot.width=14, repr.plot.height= 8)
sb <- cowplot::plot_grid(p_phy, p_Bact,p_clostridia,p_ClosXI,ncol = 2, labels = c("A", "B", "C", "D"))
sb
cowplot::save_plot('figures/stacked_bars.png', sb, base_width = 14, base_height = 8)
cowplot::save_plot('figures/stacked_bars.svg', sb, base_width = 14, base_height = 8)
# psDf %>%
# mutate(OTU = map(ps, ~data.frame(otu_table(.)))) %>%
# mutate(Tax = map(ps, ~data.frame(tax_table(.)))) %>%
# mutate(OTUCount = map (psCount, ~data.frame(otu_table(.)))) %>%
# pass -> psDf1
psDf %>%
mutate(OTU = map(ps, ~data.frame(otu_table(.)))) %>%
mutate(Tax = map(ps, ~as.data.frame(.@tax_table@.Data))) %>%
mutate(OTUCount = map (psCount, ~data.frame(otu_table(.)))) %>%
pass -> psDf1
psDf1 %>%
.[1:5,] %>%
mutate(Tax = map(Tax, ~dplyr::select(.,-oldname2))) %>%
print
# https://stackoverflow.com/questions/50341012/return-the-mapped-object-if-expression-inside-of-purrrpossibly-fails/50341205#50341205
rm_oldname2 <- function(x){
f = possibly(function() dplyr::select(x, -oldname2), otherwise = x)
f()
}
psDf1 %>%
#.[1:5,] %>%
mutate(Tax = map(Tax, rm_oldname2)) %>%
pass -> psDf1b
print(psDf1)
# Show which species level OTUs are contained in each agglomerated group:
psDf1b %>%
.[1:5,] %>%
mutate(TaxIdx = map(Tax, function(df){
df %>%
mutate(tag = as.character(tag), oldGroups = as.character(oldGroups)) %>%
dplyr::select(tag, oldGroups) %>%
mutate(oldGroups = strsplit(oldGroups, ",")) %>%
unnest(oldGroups)
})) %>%
dplyr::select(taxLevels, TaxIdx) %>%
unnest(TaxIdx) %>%
mutate(oldGroups = trimws(oldGroups)) %>% # Some of these have leading or trailing whitespace
spread(taxLevels, tag) %>%
dplyr::select(oldGroups, Phylum, Class, Order, Family, Genus) %>%
pass -> taxGroupMapping
write_csv(taxGroupMapping, 'tables/taxGroupMapping.csv')
# Print out each otu table (relative abundances).
walk2(psDf1b$taxLevels, psDf1b$OTU,
~write.csv(.y, file = paste0("tables/OTU/otu_",.x, ".csv")))
# Print out each otu table (counts).
walk2(psDf1b$taxLevels, psDf1b$OTUCount,
~write.csv(.y, file = paste0("tables/OTU/otuCount_",.x, ".csv")))
# Print out each taxonomy table.
walk2(psDf1b$taxLevels, psDf1b$Tax,
~write_csv(.y, path = paste0("tables/Tax/tax_",.x, ".csv")))
The reviewer asks if it associates with MDS1. I’ll check for associations with immunogenicity as well.
Calculate the alpha diversity values for psN1
bN1 <- breakaway(psN1)
Initial look at alpha diveristy values.
plot(bN1)
Large confidence intervals. Also, probably best to examine in log space going forward.
Range of breakaway estimates
summary(bN1)$estimate %>% range
[1] 151.0282 441.4724
btN_MDS <- betta(summary(bN1)$estimate,
summary(bN1)$error,
make_design_matrix(psN1, "MDS1")
)
btN_MDS
$table
Estimates Standard Errors p-values
(Intercept) 243.9720 16.82711 0.000
predictors -31.7198 27.31450 0.246
$cov
(Intercept) predictors
(Intercept) 287.4724 57.2095
predictors 57.2095 757.4669
$ssq_u
[1] 1932.271
$homogeneity
[1] 6.250059e+01 1.546382e-06
$global
[1] 215.6907 0.0000
$blups
[1] 190.9549 235.6741 241.4954 196.3625 222.4246 270.5355 262.2978 280.2053 194.4011 282.1363 231.4248 227.0622 253.5892
[14] 260.3696 266.7135 248.6283 296.0946 232.9051 201.2419 264.0743 264.8205
Not in any way that is statistically significant (p = 0.246)
Some pre-computing
richEsts <- bN1 %>% map_dbl("estimate")
richCI <- bN1 %>% map_df("interval") %>% t %>% as.data.frame() %>% rename(richLb = V1, richUb = V2) %>% merge(as.data.frame(richEsts), by = "row.names") %>% transform(row.names = Row.names, richEst = richEsts, Row.names = NULL, richEsts = NULL)
# need to add mds1 to this, or just tack this all onto psN1rich
psN1rich <- psN1
sample_data(psN1rich) <- merge(psN1@sam_data, richCI, by = "row.names") %>% transform(row.names = Row.names, Row.names = NULL)
Plot richness vs mds1
psN1rich %>% sample_data %>% as.data.frame %>%
ggplot(aes(x = MDS1, y = richEst)) + geom_point() #+ geom_errorbar()
As above but with error bars.
psN1richP <- psN1rich %>% sample_data %>% as.data.frame %>%
ggplot(aes(x = MDS1, y = richEst, ymin = richLb, ymax = richUb)) + geom_point() + geom_errorbar() + scale_y_log10()
psN1richP
The reviewer actually asked whether unifrac distance associates with alpha diversity. I’ve done that with MDS1, but not distance per se. Lets do one kernel regression, where we ask whether unifrac distance associates with richness.
Kernel regression, weighted unifrac vs richness
mirRich <- MiRKAT(y = richEsts, Ks = wufKN2, out_type = "C", method = 'permutation', nperm = jnperm)
mirRich
[1] 0.22179
Very similar p-value to running betta against MDS1.
PCoA figure that compares MDS1 and MDS2 to richness
psN1richMDSP<- psN1rich %>%
sample_data() %>%
ggplot(aes(x = MDS1, y = MDS2)) + geom_point(aes(fill = richEst), size = 5, stroke = 1, shape = 21) +
viridis::scale_fill_viridis(name = 'richEst', direction = 1, option = "viridis") +
coord_fixed(sqrt(psN2.pcoa$CA$eig[2]/psN2.pcoa$CA$eig[1])) +
#scale_colour_manual(name = 'gp41 Primary', values = c('black', 'grey70')) +
cowplot::theme_cowplot()
psN1richMDSP
Here is a discrete example originally I used Gp120 month 6.5. Switching to gp41 month 6.5, since subsequent analyis suggested that it does show a relationship, and so makes a more useful exemplar
gpLocmtx <- cbind(1,
psN1 %>% sample_data %>% as.matrix %>% as.data.frame %>% pull(IgG_gp41_Month_6.5) %>% as.character %>% as.numeric %>% medcode
)
colnames(gpLocmtx) = c("(Intercept)", "predictors")
btNLoc <- betta(summary(bN1)$estimate,
summary(bN1)$error,
gpLocmtx
)
btNLoc
$table
Estimates Standard Errors p-values
(Intercept) 209.10249 13.30004 0.000
predictors 76.38459 22.30439 0.001
$cov
(Intercept) predictors
(Intercept) 274.4924 -274.4924
predictors -274.4924 771.9782
$ssq_u
[1] 780.9728
$homogeneity
[1] 24.0326989 0.1949011
$global
[1] 323.1194 0.0000
$blups
[1] 184.0879 256.1046 284.6409 195.8995 211.0920 290.4048 217.0720 288.8596 204.3579 291.4781 213.4344 210.8080 231.2472
[14] 222.1869 291.5582 286.9768 296.3785 285.9490 200.8389 280.4231 287.5847
Step 1, make a data frame with BN1, but pub_id for each sample
Pre calculations
SampleNameToPubId <- psN1 %>% sample_data %>% as.data.frame %>% rownames_to_column() %>% as.tibble %>% dplyr::select(rowname, pub_id) %>% na.omit()
Setting class(x) to multiple strings ("tbl_df", "tbl", ...); result will no longer be an S4 object
A function that takes immunogenicity data, x (which we will pass in with tidyverse mapping functions), ad, the breakaway richness summary, and a transformation, defaults to medcode2 of the immunogenicity data.
immuneValpha <- function(x, ad = bN1, transformation = medcode2){
#x <- arrange(x, rowname)
x2 <- right_join(x, SampleNameToPubId, by = 'pub_id')
x3 <- x2[is.finite(x2$mag),]
ad2 <- ad[is.finite(x2$mag)]
class(ad2) <- c("alpha_estimates", "list")
gpXmtx = cbind(1, transformation(x3$mag))
#gpXmtx
thing <- betta(summary(ad2)$estimate,
summary(ad2)$error,
gpXmtx
)
out <- as.data.frame(t(thing$table[2,]))
#colnames(out) = "pval"
# add R^2 value, from simple pearson correlation
locCor <- cor(summary(ad2)$estimate, gpXmtx[,2], method = "pearson")
r2 <- locCor^2
out2 <- data.frame(out, r2)
out2
}
# test a single use case
immuneValpha(use.immune %>% filter(visitno == 9, type == "IgG", antigen == "Con.6.gp120.B"), bN1, transformation = medcode)
Actually run the analysis, raw table
immuneAlphaCompiled <- use.immune %>%
group_by(type, antigen, month) %>%
do(immuneValpha(.)) %>%
ungroup # if you forget to ungroup, pvalue calculations don't work correctly
immuneAlphaTable <- immuneAlphaCompiled %>% mutate(qval = p.adjust(`p.values`, method = "BH"))
immuneAlphaTable
Pretty up the table above for publication
conciseImmuneAlphaTable <- immuneAlphaTable %>%
dplyr::select(Type = type, Antigen = antigen, Month = month, Coef=Estimates, R2 = r2, P=`p.values`, Q = qval)
toAlphaTable <- conciseImmuneAlphaTable %>%
mutate(
Coef = cell_spec(format(Coef, digits = 3), "html",
bold = ifelse(P < 0.05, T, F),
italic = ifelse(P < 0.05 & Coef < 0, T, F),
background = ifelse(P < 0.05, ifelse(Coef < 0, "lightsalmon", "lightblue"), "")),
R2 = cell_spec(round(R2, digits = 3), "html"),
P = cell_spec(format_round(P, 3), "html",
bold = ifelse(P < 0.05, T, F),
background = ifelse(P < 0.05, 'yellow', '')
),
Q = cell_spec(format_round(Q, 3), "html",
bold = ifelse(Q < 0.2, T, F),
background = ifelse(Q < 0.2, 'lightyellow', '')
)
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen))
toAlphaTable %>% kable("html", escape = F, digits = 3, align = 'c') %>%
kable_styling("striped", "hover", full_width = F) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass-> conciseAlphaTable.html
conciseAlphaTable.html %>% cat(file = 'tables/conciseAlphaTable.html')
conciseAlphaTable.html
| Type | Antigen | Month | Coef | R2 | P | Q |
|---|---|---|---|---|---|---|
| CD4+ | Any ENV PTEG | 6.5 | 40.69 | 0.033 | 0.041 | 0.091 |
| 12.0 | 44.76 | 0.006 | 0.013 | 0.037 | ||
| IgA | gp41 | 0.0 | -1.84 | 0.032 | 0.946 | 0.946 |
| 6.5 | 16.30 | 0.145 | 0.534 | 0.628 | ||
| 12.0 | 68.38 | 0.211 | 0.017 | 0.042 | ||
| p24 | 0.0 | 5.46 | 0.007 | 0.827 | 0.871 | |
| 6.5 | 56.58 | 0.16 | 0.010 | 0.033 | ||
| 12.0 | 32.85 | 0.014 | 0.168 | 0.280 | ||
| IgG | Con.6.gp120.B | 6.5 | 21.04 | 0.003 | 0.348 | 0.535 |
| 12.0 | 17.55 | 0.001 | 0.440 | 0.587 | ||
| gp41 | 0.0 | 17.12 | 0.001 | 0.503 | 0.628 | |
| 6.5 | 76.38 | 0.247 | 0.001 | 0.010 | ||
| 12.0 | 61.02 | 0.094 | 0.009 | 0.033 | ||
| gp70 B.CaseA V1-V2 | 6.5 | -5.83 | 0.006 | 0.812 | 0.871 | |
| 12.0 | 70.51 | 0.062 | 0.000 | 0.000 | ||
| p24 | 0.0 | -30.85 | 0.033 | 0.129 | 0.235 | |
| 6.5 | 60.60 | 0.117 | 0.010 | 0.033 | ||
| 12.0 | 47.26 | 0.234 | 0.074 | 0.148 | ||
| ZM96.gp140 | 6.5 | 21.89 | 0.036 | 0.387 | 0.553 | |
| 12.0 | 66.78 | 0.095 | 0.002 | 0.013 |
immuneAlphaCompiledGaus <- use.immune %>%
group_by(type, antigen, month) %>%
do(immuneValpha(., transformation = jac_box_cox_2)) %>%
ungroup
immuneAlphaTableGaus <- immuneAlphaCompiledGaus %>% mutate(qval = p.adjust(`p.values`, method = "BH"))
immuneAlphaTableGaus
conciseImmuneAlphaTableGaus <- immuneAlphaTableGaus %>%
dplyr::select(Type = type, Antigen = antigen, Month = month, Coef=Estimates, R2 = r2, P=`p.values`, Q = qval)
toAlphaTableGaus <- conciseImmuneAlphaTableGaus %>%
mutate(
Coef = cell_spec(format(Coef, digits = 3), "html",
bold = ifelse(P < 0.05, T, F),
italic = ifelse(P < 0.05 & Coef < 0, T, F),
background = ifelse(P < 0.05, ifelse(Coef < 0, "lightsalmon", "lightblue"), "")),
R2 = cell_spec(ifelse(R2 < 0.01, format(R2, digits = 2),round(R2, digits = 2)) , "html"),
P = cell_spec(format_round(P, 3), "html",
bold = ifelse(P < 0.05, T, F),
background = ifelse(P < 0.05, 'yellow', '')
),
Q = cell_spec(format_round(Q, 3), "html",
bold = ifelse(Q < 0.2, T, F),
background = ifelse(Q < 0.2, 'lightyellow', '')
)
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen))
toAlphaTableGaus %>% kable("html", escape = F, digits = 3, align = 'c') %>%
kable_styling("striped", "hover", full_width = F) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass-> conciseAlphaTableGaus.html
conciseAlphaTableGaus.html %>% cat(file = 'tables/conciseAlphaTableGaus.html')
conciseAlphaTableGaus.html
| Type | Antigen | Month | Coef | R2 | P | Q |
|---|---|---|---|---|---|---|
| CD4+ | Any ENV PTEG | 6.5 | 30.50 | 0.13 | 0.016 | 0.064 |
| 12.0 | 27.49 | 9.9e-04 | 0.094 | 0.235 | ||
| IgA | gp41 | 0.0 | -6.51 | 0.03 | 0.718 | 0.818 |
| 6.5 | 2.15 | 0.06 | 0.890 | 0.890 | ||
| 12.0 | 32.88 | 0.09 | 0.045 | 0.150 | ||
| p24 | 0.0 | -6.77 | 2.8e-04 | 0.714 | 0.818 | |
| 6.5 | 15.87 | 0.03 | 0.374 | 0.680 | ||
| 12.0 | 25.20 | 0.06 | 0.226 | 0.452 | ||
| IgG | Con.6.gp120.B | 6.5 | 6.48 | 2.4e-06 | 0.706 | 0.818 |
| 12.0 | 5.36 | 7.9e-05 | 0.736 | 0.818 | ||
| gp41 | 0.0 | 10.14 | 1.2e-04 | 0.510 | 0.774 | |
| 6.5 | 29.66 | 0.18 | 0.002 | 0.010 | ||
| 12.0 | 31.71 | 0.17 | 0.000 | 0.000 | ||
| gp70 B.CaseA V1-V2 | 6.5 | 3.24 | 2.1e-03 | 0.861 | 0.890 | |
| 12.0 | 36.15 | 0.03 | 0.000 | 0.000 | ||
| p24 | 0.0 | -14.39 | 0.1 | 0.421 | 0.702 | |
| 6.5 | 21.71 | 0.14 | 0.175 | 0.389 | ||
| 12.0 | 31.96 | 0.21 | 0.060 | 0.171 | ||
| ZM96.gp140 | 6.5 | 8.40 | 4.7e-04 | 0.542 | 0.774 | |
| 12.0 | 32.21 | 0.09 | 0.000 | 0.000 |
Richness vs alpha above, but this time colorcoding by group.
psN1richP_gp41m6 <- psN1rich %>% sample_data %>% as.data.frame %>%
ggplot(aes(x = MDS1, y = richEst, ymin = richLb, ymax = richUb, fill = as.factor(medcode_hl(IgG_gp41_Month_6.5)))) + geom_point(shape = 21, size = 4) + geom_errorbar() + scale_y_log10() + scale_fill_viridis_d(direction = -1, name = 'gp41 Primary Timepoint') + labs(x = "MDS1", y = "Richness (# ASVs)") + cowplot::theme_cowplot()
psN1richP_gp41m6
Summary figure for of alpha
Combined figure
par <- options()
#options(repr.plot.width=6, repr.plot.height= 11)
#g <- grid.arrange(wuford_gp41, wuford_gp120, ncol = 2)
g <- cowplot::plot_grid(psN1richMDSP, psN1richP_gp41m6, ncol = 1, labels = c("A", "B"), label_size = 24, align = "v")
g
cowplot::save_plot('figures/richnessMDS1Gp41.png', g, base_width = 6, base_height = 8)
This is in response to a reviewr comment. Strategy, write a function to, for one variable of interest, compare the groups, as I did for unifrac distance. I’ll use a simple logistic regression here. Similar to CapVar. I want a use.immune data set that includes whether people are a donor as a column, but that
immune.data %>%
mutate(isDonor = pub_id %in% muDoners) %>%
filter(
(type == 'IgG' &
antigen %in% ants1 &
month %in% c(6.5,12)
) |
(type %in% c('IgG', 'IgA') &
antigen %in% ants2 &
month %in% c(0,6.5,12)
) |
type == 'CD4+' &
antigen == 'ANY.ENV.PTEG' &
month %in% c(6.5, 12)
)-> allparticipants.immune
head(allparticipants.immune)
allparticipants.test <- allparticipants.immune %>% filter(visitno ==9, antigen == "Con.6.gp120.B",type == "IgG")
head(allparticipants.test)
donorTest <- function(x, transformation = medcode2, family = 'binomial'){
loc_glm <- glm(transformation(mag) ~ isDonor, data = x, family = family)
loc_glm %>% broom::tidy() %>% filter(term == 'isDonorTRUE') %>% dplyr::select(-term)
}
donorTest(allparticipants.test)
Do on everything
allparticipants.immune %>%
filter(ct == 'T') %>%
group_by(type, antigen, visitno) %>%
do(data.frame(donorTest(.))) %>%
pass-> DonorEffectTable
DonorEffectTable
Ok. There appears to be no significant difference in magnitude group for any category.
allparticipants.immune %>%
filter(ct == 'T') %>%
group_by(type, antigen, visitno) %>%
do(data.frame(donorTest(., transformation = function(x){jac_box_cox_2(x)},
family = 'gaussian'))) %>%
pass-> DonorEffectGaus
Monkeys, I guess its debug aclock.
DonorEffectGaus
Same deal when I do a normal logistic regression.
save.image(file = "workspace.Rdata")
'package:stats' may not be available when loading